Emeritus Professor of Rheumatology, University of Edinburgh, Osteoarticular Research Group, The Queen's Medical Research Institute, 47 Little France Crescent, Edinburgh EH16 4TJ, United Kingdom.
Email : email@example.com
As a prelude to developing updated, evidence-based, international consensus recommendations for the management of hip and knee osteoarthritis (OA), the Osteoarthritis Research Society International (OARSI) Treatment Guidelines Committee undertook a critical appraisal of published guidelines and a systematic review (SR) of more recent evidence for relevant therapies.
Sixteen experts from four medical disciplines (primary care two, rheumatology 11, orthopaedics one and evidence-based medicine two), two continents and six countries (USA, UK, France, Netherlands, Sweden and Canada) formed the guidelines development team. Three additional experts were invited to take part in the critical appraisal of existing guidelines in languages other than English. MEDLINE, EMBASE, Science Citation Index, CINAHL, AMED, Cochrane Library, seven Guidelines Websites and Google were searched systematically to identify guidelines for the management of hip and/or knee OA. Guidelines which met the inclusion/exclusion criteria were assigned to four groups of four appraisers. The quality of the guidelines was assessed using the AGREE (Appraisal of Guidelines for Research and Evaluation) instrument and standardised percent scores (0-100%) for scope, stakeholder involvement, rigour, clarity, applicability and editorial independence, as well as overall quality, were calculated. Treatment modalities addressed and recommended by the guidelines were summarised. Agreement (%) was estimated and the best level of evidence to support each recommendation was extracted. Evidence for each treatment modality was updated from the date of the last SR in January 2002 to January 2006. The quality of evidence was evaluated using the Oxman and Guyatt, and Jadad scales for SRs and randomised controlled trials (RCTs), respectively. Where possible, effect size (ES), number needed to treat, relative risk (RR) or odds ratio and cost per quality-adjusted life year gained (QALY) were estimated.
Twenty-three of 1462 guidelines or consensus statements retrieved from the literature search met the inclusion/exclusion criteria. Six were predominantly based on expert opinion, five were primarily evidence based and 12 were based on both. Overall quality scores were 28%, 41% and 51% for opinion-based, evidence-based and hybrid guidelines, respectively (P=0.001). Scores for aspects of quality varied from 18% for applicability to 67% for scope. Thirteen guidelines had been developed for specific care settings including five for primary care (e.g., Prodigy Guidance), three for rheumatology (e.g., European League against Rheumatism recommendations), three for physiotherapy (e.g., Dutch clinical practice guidelines for physical therapy) and two for orthopaedics (e.g., National Institutes of Health consensus guidelines), whereas 10 did not specify the target users (e.g., Ontario guidelines for optimal therapy). Whilst 14 guidelines did not separate hip and knee, eight were specific for knee but only one for hip. Fifty-one different treatment modalities were addressed by these guidelines, but only 20 were universally recommended. Evidence to support these modalities ranged from Ia (meta-analysis/SR of RCTs) to IV (expert opinion). The efficacy of some modalities of therapy was confirmed by the results of RCTs published between January 2002 and 2006. These included exercise (strengthening ES 0.32, 95% confidence interval (CI) 0.23, 0.42, aerobic ES 0.52, 95% CI 0.34, 0.70 and water-based ES 0.25, 95% CI 0.02, 0.47) and nonsteroidal anti-inflammatory drugs (NSAIDs) (ES 0.32, 95% CI 0.24, 0.39). Examples of other treatment modalities where recent trials failed to confirm efficacy included ultrasound (ES 0.06, 95% CI -0.39, 0.52), massage (ES 0.10, 95% CI -0.23, 0.43) and heat/ice therapy (ES 0.69, 95% CI -0.07, 1.45). The updated evidence on adverse effects also varied from treatment to treatment. For example, while the evidence for gastrointestinal (GI) toxicity of non-selective NSAIDs (RR=5.36, 95% CI 1.79, 16.10) and for increased risk of myocardial infarction associated with rofecoxib (RR=2.24, 95% CI 1.24, 4.02) were reinforced, evidence for other potential drug related adverse events such as GI toxicity with acetaminophen or myocardial infarction with celecoxib remained inconclusive.
Twenty-three guidelines have been developed for the treatment of hip and/or knee OA, based on opinion alone, research evidence or both. Twenty of 51 modalities of therapy are universally recommended by these guidelines. Although this suggests that a core set of recommendations for treatment exists, critical appraisal shows that the overall quality of existing guidelines is sub-optimal, and consensus recommendations are not always supported by the best available evidence. Guidelines of optimal quality are most likely to be achieved by combining research evidence with expert consensus and by paying due attention to issues such as editorial independence, stakeholder involvement and applicability. This review of existing guidelines provides support for the development of new guidelines cognisant of the limitations in existing guidelines. Recommendations should be revised regularly following SR of new research evidence as this becomes available.
Osteoarthritis (OA) is the most common form of arthritis and a major contributor to functional impairment and reduced independence in older adults1. The hip and knee are the principal large joints affected by OA. Although estimates of the prevalence of hip and knee OA vary considerably depending on whether the disease is defined by both symptoms and radiographic changes, or by radiographic criteria alone, knee OA is more prevalent2, 3, 4, 5, 6 than hip OA7, 8, 9, 10, 11. Overall, as many as 40% of those aged over 65 in the community may have symptomatic OA of the knee or hip12, 13. Current treatment strategies with both non-pharmacologic and pharmacologic therapies aim to reduce pain, physical disability and handicap, and some of them attempt to limit structural deterioration in affected joints. Surgical therapies are available for patients who fail to respond to more conservative measures14, 15. In recent years, both the American College of Rheumatology (ACR) and the European League against Rheumatism (EULAR) have developed recommendations to optimise the treatment of hip and/or knee OA based on a variable combination of expert consensus and systematic review (SR) of research evidence16, 17, 18. Although these guidelines are used by physicians, funding authorities and government agencies in order to try and improve the quality of care of patients with knee and hip OA, they have been criticised for lack of methodological rigour, stakeholder involvement and applicability19, 20, 21; and the recommendations for certain modalities of treatment that they contain may require modification following publication of more recent randomised controlled trials (RCTs) and meta-analyses (MAs). The Osteoarthritis Research Society International (OARSI) therefore appointed an international, multidisciplinary committee of experts in September 2005 with the remit of producing up to date, evidence-based, globally relevant consensus recommendations for the management of hip and/or knee OA in 2007. The committee undertook a critical appraisal of existing evidence-based and consensus guidelines and an SR of the current research evidence; as a prelude to developing consensus recommendations following a Delphi exercise. This paper reports the results of the critical appraisal of existing treatment guidelines and the SR of the more recent research evidence. The purpose of this study was to identify the evidence available, assess its quality and to use this knowledge to develop a new guideline. Part II of this document: “The OARSI evidence-based consensus recommendations for the treatment of OA of the hip and knee” will be published separately in Osteoarthritis and Cartilage.
The guideline development committee was composed of 16 experts from four medical disciplines (primary care two, rheumatology 11, orthopaedics one, and evidence-based medicine two) and six countries in Europe and North America (France, Netherlands, Sweden, UK, Canada and the USA). All members of this guideline development team participated in: (1) a critical appraisal of existing treatment guidelines; (2) a Delphi exercise to generate consensus recommendations; and (3) an exercise to grade the strength of recommendation for all modalities of therapy recommended. Three additional experts were invited to undertake critical appraisals of existing guidelines in languages other than English.
Critical appraisal of existing guidelines
Systematic literature search
A systematic literature search for existing guidelines for the management of hip and/or knee OA published in any language between 1945 and October 2005 was undertaken using MEDLINE (1966–), EMBASE (1980–), CINAHL (1980–), AMED (1985–) and the Science Citation Index (1945–). The search strategy consisted of two basic components: guidelines in any term (e.g., guidelines, recommendations, standards, algorism, or expert consensus, etc.) and hip or knee OA in any possible terms in the databases (Appendix 1). In addition, Google (the first 100 hits) and seven Guideline Websites were searched, including the National Guideline Clearinghouse http://www.guidelines.gov/, Primary Care Clinical Practice Guidelines http://medicine.ucsf.edu/resources/guidelines/, the Scottish Intercollegiate Guidelines Network http://www.sign.ac.uk/, the Canadian Medical Association Infobase for Clinical Practice Guidelines http://mdm.ca/cpgsnew/cpgs/index.asp, the Guidelines International Network http://www.g-i-n.net/, Evidence Based Medicine Guidelines http://www.ebm-guidelines.com/, and the National Institute for Clinical Excellence http://www.nice.org.uk/.
Guidelines developed for the management of hip and/or knee OA using consensus or evidence-based methods were included. The latest version was included if the guidelines had been updated. Guidelines developed for OA in other joints or for aspects of OA other than treatment were excluded, as were narrative reviews, commentaries and appraisals of implementation.
Quality and content assessment
English language guidelines were randomly assigned to three groups of four committee members for appraisal of quality and content. Three guidelines published in German and Dutch were appraised by three additional experts who were fluent in these languages. The quality of the guidelines was assessed using the AGREE instrument22, in which 23 criteria in seven domains are evaluated. These include the scope and purpose of the guidelines, stakeholder participation, methodological rigour, clarity, applicability, editorial independence and overall quality. The content was extracted using a comprehensive reference list of treatment modalities. Each appraiser scored the guidelines independently and results were collected and analysed by the lead investigator (WZ) and the co-chairs (GN and RM), who did not take part in the assessment.
The appraisers' scores from each group were expressed as standardised domain scores on a percentage scale (0–100%)22. Guidelines were categorised according to the methods (expert opinion based, research evidence based or both), the target users to whom they were directed (primary care, rheumatology, physiotherapy or orthopaedics), the scope of the recommendations (general and specific treatments) and the joints for which the guidelines were applicable (hip, knee, or hip and knee). Quality scores were compared between groups using an analysis of variance (ANOVA). Agreement (%) between guidelines was calculated by
|Ia||MA of RCTs|
|IIa||Controlled study without randomisation|
|III||Non-experimental descriptive studies, such as comparative, correlation, and case–control studies|
|IV||Expert committee reports or opinion or clinical experience of respected authorities, or both|
SR of recent evidence
Systematic literature search
A systematic search of the literature published between 31 January 2002 and 31 January 2006 was undertaken using MEDLINE, EMBASE, CINHAL, AMED, the Science Citation Index and the Cochrane Library databases. Research evidence prior to January 2002 was not sought systematically as this was available from the systematic literature review conducted by EULAR17. Separate searches for research evidence for each treatment modality were undertaken. Each search was conducted sequentially according to the evidence hierarchy (SRs/MAs, followed by RCTs/controlled trials (CTs), quasi-experimental and uncontrolled studies) (Table II)23. An example of how this search strategy was employed to obtain the best available research evidence for the efficacy of acetaminophen (paracetamol) is shown in Appendix 2. The same strategy was used for searching MEDLINE, EMBASE, CINHAL and AMED. For the Science Citation Index, however, a key word search was used and all possible terms and combinations of terms were tied in order to obtain relevant citations. Medical subject heading searches (MeSH) were used for all databases and key word searches were used if a MeSH search was not available. All MeSH search terms were exploded. The reference lists of SRs were examined and any additional studies meeting the inclusion/exclusion criteria were included.
|Type of guidelines|
|Opinion based||6||Royal College of Physicians, etc.|
|Evidence based||5||Prodigy Guidance, etc.|
|General||13||ACR, EULAR, etc.|
|Specific||10||MOVE, Canadian NSAIDs, etc.|
|Primary care||5||Prodigy Guidance, etc.|
|Physiotherapy||3||Dutch physiotherapy, etc.|
|Orthopaedics||2||NIH consensus, etc.|
|Not specified||10||Ontario, ICSI, etc.|
|English||21||ACR, EULAR, etc.|
|Others||2||German, Malay, etc.|
The search in the Cochrane Library included MeSH searches of Cochrane reviews, abstracts of Quality Assessed Systematic Reviews, the Cochrane Controlled Trial Register, the National Health Service (NHS) Economic Evaluation Databases, the Health Technology Assessment Database and the NHS Economic Evaluation Bibliography Details Only. In addition, a comprehensive search for all articles including the term OA regardless of treatment was undertaken.
Only studies with clinical outcomes for hip and/or knee OA were included. The main focus was on SRs/MAs, RCTs/CTs, uncontrolled trials, cohort studies, case–control studies, cross-sectional studies and economic evaluations. Studies of OA at other sites such as the hand or spine, and other chronic joint diseases were excluded, apart from studies in which adverse effects of relevant pharmacologic treatments were being investigated as a primary outcome. Case reports, animal studies, non-clinical outcome studies, narrative review articles, commentaries and guidelines were excluded.
The efficacy of any modality of treatment was determined by using the best available evidence. For example, when the efficacy of an intervention could be confirmed by category Ia evidence (MA/SR of RCTs), then studies lower in the evidence hierarchy such as individual RCTs (category Ib) were not reviewed (Table I). If there was more than one study in the same evidence level (e.g., four SRs for NSAIDs), the study with the best quality score was used. Information concerning side effects was obtained from both RCTs and observational studies. While the efficacy of each therapeutic intervention was assessed separately for hip and knee OA, side effects were evaluated for each intervention regardless of the OA therapy and the target joint. For determination of cost effectiveness, only cost-utility analyses were included.
The quality of SR/MAs was assessed using the Oxman and Guyatt checklist24 and the quality of RCTs was evaluated using the Jadad method25. All quality scores were converted into percentages of the maximum score attainable. Quality assessments were not undertaken for other types of study designs, such as cohort or case–control studies. For cost-utility analysis, study perspective, comparator, time horizon, discounting, modelling and uncertainty were evaluated.
Effect sizes (ESs) and 95% confidence intervals (CIs) compared with placebo or active control were calculated for continuous outcomes such as reduction of pain from baseline or improvement in function26. ES is the standard mean difference, i.e., the mean difference between a treatment and a control group divided by the standard deviation of the difference. It is expressed as a number without units and can be used for comparisons across all interventions. From the clinical standpoint ESs of 0.2 are considered small and 0.5 moderate, while an ES>0.8 indicates a large clinical effect27. Statistical pooling was undertaken, as appropriate, when SRs were not available28. For dichotomous data, such as the percentage of patients with moderate to excellent (or more than 50%) pain relief or symptomatic improvement, the number needed to treat (NNT) was estimated29. The NNT is the estimated number of patients who need to be treated to achieve the target effect. Thus the smaller the NNT the better the treatment effect. The 95% CI for the NNT was calculated using Altman's method30.
The relative risk (RR) of side effects was calculated from RCTs or cohort studies for the incident risk, and from cross-sectional studies for prevalent risk. Odds ratios (ORs) were calculated from case–control studies31. Both RR and OR provide information on how many times more likely (or less likely) it is that a subject who is exposed to a treatment modality will have an adverse event, when compared with a subject who is not exposed. An RR/OR=1 indicates no increased risk, whereas an RR/OR>1 or <1 indicates increased or decreased risk, respectively.
Only cost-utility analysis was reviewed, where cost per quality-adjusted life years (QALYs) gained was used. Costs were converted into US dollars and values were discounted by 5% per year from the year in which the study was published until 2006.
Data were extracted by two investigators (WZ and a research assistant, Jane Robertson). A customised form was used for data extraction and quality assessment. Any discrepancies were discussed and agreed between the extractors prior to analysis. The data from the non-English language studies were extracted by assessors with good understanding of the languages concerned.
Quality and contents of existing guidelines
The systematic literature search yielded 1462 citations (MEDLINE 276, EMBASE 413, CINAHL 81, AMED 27 and SCI 553, Google and Guidelines Websites 112). Of these, 23 met the inclusion and exclusion criteria specified16, 17, 18, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51. Six guidelines were predominantly based on opinion, five primarily based on evidence and 12 based on both (Table II). Whilst the majority of the guidelines14 did not separate hip and knee, eight were specific for knee but only one for hip OA. Thirteen guidelines had been developed for specific care settings (five for primary care, three for rheumatology, three for physiotherapy and two for orthopaedics); but 10 did not specify target users.
Scores for overall quality of guidelines were 28%, 41% and 51% for opinion-based, evidence-based and hybrid guidelines, respectively (P<0.001) (Fig. 1). Scores for different quality criteria varied but apart from applicability, opinion-based guidelines tended to have lower scores (Table III).
|Opinion based||Evidence based||Hybrid||P|
s.e.m.: standard error of mean.
Fifty-one treatment modalities were addressed in the 23 guidelines. Twenty of these modalities were recommended by all (100%) of the guidelines in which they were addressed (Table IV), but the strength of agreement for any modality appeared to be related to the number of guidelines that addressed that modality. For example, while regular telephone contact and knee fusion were recommended in 100% of the guidelines in which these modalities of therapy were considered, this was actually in only two guidelines for each modality. By contrast, although weight loss was not universally recommended, it was in fact recommended in 13/14 of the guidelines where this modality was considered.
|Level of evidence††||Agreement (number of guidelines recommending the modality/total number of guidelines addressing the modality)|
|Ia||Ultrasound (1/5)||Chondroitin sulphate (2/7)||
|Ib||Laser (1/6)||Nutrients (1/3)||Acupuncture (5/8)||Weight loss (13/14)||Combination therapy (12/12)|
|Electrotherapy/EMG (1/8)||Massage (1/2)||Patellar tape (12/13)‡‡||Joint lavage (3/3)‡‡|
|Diacerhein (1/2)||Avocado soybean unsaponifiables (3/4)||Herbs (2/2)|
|IV||Oral steroid (0/2)||Arthroscopic debridement (5/6)‡‡|
TENS=Transcutaneous Electrical Nerve Stimulation; EMG=Electromyography; TJR=Total Joint Replacement.
Evidence to support recommendations ranged from Ia (SR of RCTs) to IV (expert opinion), and did not necessarily reflect the extent of agreement (Table IV). For example, while canes/sticks, total joint replacement and osteotomy were not supported by RCTs, they were still universally recommended in the guidelines which addressed them. In contrast, despite evidence from SRs of RCTs for the efficacy of chondroitin sulphate and ultrasound, they were recommended by <50% of the guidelines in which these modalities were considered (Table IV).
The results of the SR of research papers published between January 2002 and January 2006 are shown in Table V.
|Modality||Joint||QoS (%)||LoE||Recent evidence (2002–)|
|ESpain (95% CI)||ESfunction (95% CI)||ESstiffness (95% CI)||NNT (95% CI)|
|Self-management||Both||100||Ia||0.06 (0.02, 0.10)89||0.06 (0.02, 0.10)89|
|Telephone||Both||100||Ia||0.12 (0.00, 0.24)90||0.07 (0.00, 0.15)90|
|Education||Both||100||Ia||0.06 (0.02, 0.10)89||0.06 (0.02, 0.10)89|
|Strengthening||Knee||100||Ia||0.32 (0.23, 0.42)91||0.32 (0.23, 0.41)91|
|Aerobic||Knee||100||Ia||0.52 (0.34, 0.70)91||0.46 (0.25, 0.67)91|
|Water-based exercise||Both||60||Ib||0.25 (0.02, 0.47)64, 92||0.23 (0.00, 0.45)64||0.17 (−0.05, 0.39)64|
|Spa/sauna||Both||75||Ib||0.46 (0.17, 0.75)94||NS|
|Weight reduction||Knee||40||Ib||0.13 (−0.12, 0.38)52, 95||0.69 (0.24, 1.14)52||0.36 (−0.08, 0.80)52||3 (2, 9)52|
|Nutrients (e.g., SAM-e)||Knee||100||Ia||0.22 (−0.25, 0.69)96||0.31 (0.10, 0.52)96|
|TENS||Both||75||Ia||2 (1, 5)97|
|Laser||Both||100||Ia||4 (2, 17)98|
|Ultrasound||Both||50||Ia||0.06 (−0.39, 0.52)99|
|Radiotherapy||Both||50||IIb||Similar effects between OA and RA from an MA of uncontrolled trial100|
|Heat/ice||Knee||75||Ia||0.69 (−0.07, 1.45)101||1.03 (0.44, 1.62)101 for quads strength; 1.13 (0.54, 1.73)101 for flexion||0.83(−0.03, 1.69)101 for swelling|
|Massage||Knee||40||Ib||0.10 (−0.23, 0.43)102|
|Acupuncture||Knee||40||Ib||0.51 (0.23, 0.79)63||0.51 (0.23, 0.79)63||0.41 (0.13, 0.69)63||4 (3, 9)63|
|Insoles||Knee||100||Ia||No different between type of insoles, no placebo/usual care comparisons103|
|Joint protection (braces)||Knee||100||Ia||More benefits with a knee brace than a neoprene sleeve103|
|Electrotherapy/EMG||Knee||75||0.77 (0.36, 1.17)104|
|Acetaminophen||Both||100||Ia||0.21 (0.02, 0.41)105||2 (1, 2)106|
|NSAIDs||Both||100||Ia||0.32 (0.24, 0.39)107|
|COX-2 inhibitors||Both||100||Ia||0.44 (0.33, 0.55)108 (exc Deek's for OA/RA)|
|Topical NSAIDs||Knee||100||Ia||0.41 (0.22, 0.59)53||0.36 (0.24, 0.48)53||0.49 (0.17, 0.80)53||3 (2, 4)53|
|Topical capsaicin||Knee||75||Ia||4 (3, 5)109|
|IA Corticosteroid||Knee||100||Ia||0.72 (0.42, 1.02)110||0.06 (−0.17, 0.30)110||4 (2, 11)110|
|IA Hyaluronic acid||Knee||100||Ia||0.32 (0.17, 0.47)111||0.00 (−0.23, 0.23)112|
|Glucosamine sulphate||Both||100||Ia||0.61 (0.28, 0.95)113||0.07 (−0.08, 0.21)113||0.06 (−0.11, 0.23)113||5 (4, 7)114|
|Chondroitin sulphate||Knee||100||Ia||0.52 (0.37, 0.67)114||5 (4, 7)114|
|Diacerhein||Both||–||Ib||0.22 (0.01, 0.42)81, 82, 83, 84, 85|
|ASU||Both||75||Ia||More beneficial for hip OA115|
|Herbal remedy||Both||75||Ia||7 (4, 27)116|
|Arthroscopic lavage||Knee||100||Ib||0.09 (−0.27, 0.44)55||−0.10 (−0.45, 0.26)55|
|Arthroscopic debridement||Knee||100||−0.01 (−0.37, 0.35)55||−0.09 (−0.27, 0.45)55|
|Patellar resurfacing||Knee||100||Ib||9 (5, 25)117|
|Osteotomy||Knee||50||IIb||60% Pain relief from an SR of uncontrolled trial57|
|TJR||Both||100||III||TJR is effective to improve QoL, more beneficial for hip OA from an SR of cohort studies56|
ES=0.2 is considered small, ES=0.5 is moderate, and ES>0.8 is large; NNT for symptom relief, e.g., ≥50% pain relief, unless otherwise specified; SAM-e: S-adenosylmethionine; ASU: avocado soybean unsponifiable.
∗LoE (level of evidence): Ia: MA of RCTs; Ib: RCT; IIa controlled study without randomisation; IIb: quasi-experimental study (e.g., uncontrolled trial, one arm dose–response trial, etc.); III: observational studies (e.g., case–control, cohort, cross-sectional studies); IV: expert opinion.
†QoS (quality of study) was assessed using validated scales, e.g., the Oxman and Guyatt scale for SR and the Jadad's scale for clinical trials. The percentage score was calculated for each study. The best available evidence was presented, i.e., SR with the highest quality, RCT with the highest quality followed by uncontrolled or quasi experiment, cohort and case–control study.
With the exception of combination therapy, the use of a cane/stick and referral, all the non-pharmacologic and pharmacologic therapies recommended universally by existing guidelines were supported by recent SRs of RCTs (Ia) or RCTs (Ib) published after 2002. By contrast, there were no placebo controlled trials of surgical modalities of treatment such as total joint replacement and osteotomy, and supporting evidence came from uncontrolled or non-experimental observational studies (Table V). Overall quality scores for evidence ranged between 40% and 100% but 24/40 studies (60%) scored 100% (Table V).
The ES for pain relief scores varied from small (e.g., education ES=0.06, 95% CI 0.02, 0.10) to moderate (e.g., aerobic exercise ES=0.52, 95% CI 0.34, 0.70). No modality of therapy had an ES as high as 0.80 – the accepted criterion for a large clinical effect27 (Fig. 2). ESs for pain relief score with oral analgesics such as acetaminophen (ES=0.21 95% CI 0.02, 0.41) and NSAIDs (ES=0.32, 95% CI 0.24, 0.39) were small (Fig. 3 and Table V).
ESs for improvement in function were also generally small, and very similar to those for pain relief, for a number of modalities of non-pharmacological therapies (Table V). However, the ES for improvement in function for >10% weight reduction was 0.69 (95% CI 0.24, 1.14) compared with the ES for pain relief (0.13, 95% CI −0.12, 0.38). ESs for reduction in stiffness were also available for a few modalities of treatment (Table V).
Some studies provided data, which allowed calculation of NNTs. For example, weight reduction (>10%) was associated with an NNT of three (95% CI 2, 9), i.e., one in three patients with knee OA who achieved this loss of weight would have more than 50% reduction in the total Western Ontario and McMaster Universities (WOMAC) Osteoarthritis index52. The NNT for topical NSAIDs was also three (95% CI 2, 4), indicating that one in three patients with pain associated with knee OA treated with a topical NSAID would be expected to experience moderate to excellent pain relief53.
In general, non-pharmacologic therapies had numerically smaller ES (ES=0.25, 95% CI 0.16, 0.34) than pharmacological therapies (ES=0.39, 95% CI 0.31, 0.47) (Fig. 2, Fig. 3). Among surgical treatments, ES could only be calculated for arthroscopic lavage and debridement. An SR of four RCTs showed that arthroscopic joint lavage and debridement were no more effective than placebo54. One placebo controlled RCT (with a quality score of 100%) included in this review demonstrated that the ES for arthroscopic lavage and debridement vs placebo were 0.09 (95% CI −0.27, 0.44) and −0.01 (95% CI −0.37, 0.35), respectively55. Similar results were obtained for improvement in function (Table V). Although there are no placebo controlled RCTs of total joint (knee or hip) replacement or osteotomy, two recent SRs of uncontrolled trials and cohort studies confirmed that they were highly effective in relieving pain and improving quality of life56, 57.
Evidence for side effects of treatments has been mainly investigated in pharmacologic therapies. Oral NSAIDs were associated with 3–5 times the risk of gastrointestinal (GI) side effects when compared with placebo or non-exposure58, whereas treatment with topical NSAIDs resulted in no more adverse GI events than placebo (RR=0.81, 95% CI 0.43, 1.56)53 or non-exposure (OR=1.45, 95% CI 0.84, 2.50)59 (Table VI). Whether or not long-term treatment with acetaminophen 4 g daily is associated with GI and renal side effects remains inconclusive (Table VI). Treatment with cyclooxygenase-2 (COX-2) selective drugs or conventional non-selective NSAIDs together with proton pump inhibitors (PPIs) or misoprostol has been shown to be associated with a reduction in the risk of NSAID-induced upper GI side effects. However, treatment with rofecoxib has been shown to be associated with an increased risk of cardiovascular (CV) events (RR=2.24, 95% CI 1.24, 4.02)60 and treatment with misoprostol with an increased risk of diarrhoea (RR=1.81, 95% CI 1.52, 2.61)61. Following the withdrawal of rofecoxib, a number of RCTs and SRs of the CV safety of other coxibs and conventional non-selective NSAIDs have been undertaken. While the increased risk of CV side effects with rofecoxib was confirmed, the evidence for similar CV toxicity with celecoxib, valdecoxib and conventional non-selective NSAIDs was inconsistent (Table VI). However, the overall CV risk associated with COX-2 selective inhibitors was not significantly greater than that associated with conventional non-selective NSAIDs (RR=1.19, 95% CI 0.80, 1.75)62 (Table VI).
|Intervention††||Adverse events||RR/OR (95% CI)||Evidence (references)|
|Acupuncture||Any||0.76 (0.13, 4.42)||RCT63|
|Acetaminophen||GI discomfort||0.80 (0.27, 2.37)||RCTs105|
|GI perforation/bleed||3.60 (2.60, 5.10)||CC118|
|GI bleeding||1.2 (0.8, 1.7)||CCs119|
|Renal failure||0.83 (0.50, 1.39)||CS120|
|Renal failure||2.5 (1.7, 3.6)||CC121|
|NSAIDs||GI perforation/ulcer/bleed||5.36 (1.79, 16.10)||RCTs58|
|GI perforation/ulcer/bleed||2.70 (2.10, 3.50)||CSs58|
|GI perforation/ulcer/bleed||3.00 (2.70, 3.70)||CCs58|
|Myocardial infarction||1.09 (1.02, 1.15)||CSs122|
|Topical NSAIDs||GI events||0.81 (0.43, 1.56)||RCTs53|
|GI bleed/perforation||1.45 (0.84, 2.50)||CC59|
|H2 blocker+NSAID vs NSAID||Serious GI complications||0.33 (0.01, 8.14)||RCTs62|
|Symptomatic ulcers||1.46 (0.06, 35.53)||RCTs62|
|Serious CV or renal events||0.53 (0.08, 3.46)||RCTs62|
|PPI+NSAID vs NSAID||Serious GI complications||0.46 (0.07, 2.92)||RCTs62|
|Symptomatic ulcers||0.09 (0.02, 0.47)||RCTs62|
|Serious CV or renal events||0.78 (0.10, 6.26)||RCTs62|
|Misoprostol+NSAID vs NSAID||Serous GI complications||0.57 (0.36, 0.91)||RCTs62|
|Symptomatic ulcers||0.36 (0.20, 0.67)||RCTs62|
|Serious CV or renal events||1.78 (0.26, 12.07)||RCTs62|
|Diarrhoea||1.81 (1.52, 2.61)||RCTs61|
|Coxibs vs NSAID||Serious GI complications||0.55 (0.38, 0.80)||RCTs62|
|Symptomatic ulcers||0.49 (0.38, 0.62)||RCTs62|
|Serious CV or renal events||1.19 (0.80, 1.75)||RCTs62|
|Celecoxib||Myocardial infarction||2.26 (1.0, 5.1)||RCTs123|
|Myocardial infarction||0.97 (0.86, 1.08)||CSs/CCs122|
|Rofecoxib||Myocardial infarction||2.24 (1.24, 4.02)||RCTs60|
|Myocardial infarction||1.27 (1.12, 1.44)||CSs/CCs122|
|Valdecoxib||CV events||2.3 (1.1, 4.7)||RCTs124|
|Opioids||Any||1.4 (1.3, 1.6)||RCTs125|
|Constipation||3.6 (2.7, 4.7)||RCTs125|
|Glucosamine sulphate||Any||0.97 (0.88, 1.08)||RCTs113|
|Diacerhein||Diarrhoea||3.98 (2.90, 5.47)||RCTs81, 85|
H2-blockers: histamine type 2 receptor antagonists.
Four cost-utility analyses have been undertaken since 2002. One in Germany, in which acupuncture was compared with sham acupuncture63; two in the UK, which studied treatment with water-based exercises and GI protective strategies64, 65; and one in Canada, which looked at treatment with intra-articular injections of hyaluronic acid66. Two previous studies which had compared total hip and knee replacements with conventional pharmacologic and non-pharmacologic therapy were retrieved for comparison purpose67, 68. Cost/QALY varied with modalities, countries, comparators, perspectives, time horizons and discounting rates and remained variable, even after adjustment for discounting and conversion of the original cost per QALY to the current value of the US dollar (Table VII).
|Intervention||Comparator||Perspective∗||Time horizon||Discounting||Year published||Country||Cost/QUALY|
|Water-based exercise||Usual care||Societal||1 Year||No||2005||UK||£5738||1048364|
|Acupuncture||Sham acupuncture||Societal||3 Months||No||2005||Germany||17845 €||2229763|
|COX-2 specifics||NSAIDs||NHS||6 Months||No||2005||UK||£36923||7429865|
|COX-2 selectives||NSAIDs||NHS||6 Months||No||2005||UK||£30000||6036765|
|Intra-articular hyaluronic acid||Standard care||Societal||1 Year||No||2002||Canada||$10000||1045366|
|Total hip replacement||Conventional therapy||Societal||Life||5%||1996||US||$4754||813167|
|Total knee replacement||Pre-operation||Institutional||2 Years||No||1997||US||$5856||1032568|
Clinical guidelines are frequently defined as ‘systematically developed statements to assist practitioner and patient decisions about appropriate health care for specific clinical circumstances’69. OA is the most prevalent form of arthritis throughout the world1, 2, 3, 4, 5, 6, 7 and OA related knee pain is the leading cause of physical disability in older adults1.The prevalence of both symptomatic and radiographically defined hip OA7, 8, 9, 10, 11 is less than that of knee OA2, 3, 4, 5, 6 and varies from one country to another7, 8, 70. The treatment of symptomatic OA of the knee and hip are global problems, which present challenges to the clinical skills and judgement of health professionals everywhere. As there is no single treatment modality which will relieve pain, improve mobility and prevent structural progression of disease, effective management relies on the appropriate use of a number of available therapies, each of which has only limited efficacy. While a number of national and regional guidelines have been developed to assist physicians and other health professionals in their management of hip and/or knee OA16, 17, 18, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, there are currently no universally agreed recommendations, even for a core group of safe and effective therapies, that can be recommended for the treatment of OA of the knee and hip throughout the world. As a prelude to developing updated, evidence-based, international, expert consensus recommendations for the management of hip and knee OA, the OARSI Treatment Guidelines Committee undertook a critical appraisal of existing published guidelines and an SR of more recent evidence for relevant therapies. The purpose of these preliminary appraisals was (1) to establish the extent to which different modalities of therapy are recommended in existing guidelines, and to explore the possibility that there may be a core set of recommendations common to all the guidelines; (2) to investigate the extent to which these guidelines are based on available research evidence; (3) to assess the quality of the guidelines using the widely accepted AGREE criteria; and (4) to examine the extent to which more recent research evidence confirms, or fails to confirm, recommendations in existing guidelines.
Treatment modalities recommended in existing guidelines, core recommendations and their evidence base
The critical appraisal of the 23 existing guidelines showed that of 51 treatment modalities addressed, 20 were universally recommended in those guidelines in which they were considered (100% agreement in Table IV). These included recommendations for non-pharmacological modalities of therapy such as education, exercise, patient contact by telephone and provision of walking aids and pharmacological treatments such as acetaminophen, non-selective NSAIDs with co-prescription of gastroprotective agents or selective COX-2 inhibitors, opioids and some herbal remedies. Surgical treatments recommended in all the guidelines in which they were considered included knee aspiration and joint lavage as well as osteotomy, knee fusion and total joint replacements. Self-management and the combination of non-pharmacologic and pharmacologic treatments were also uniformly recommended core recommendations. It is apparent that this core set of recommended therapies must reflect the availability of treatments. The less than universal recommendation for some modalities of therapy may have been a consequence of them not being universally available, e.g., topical NSAIDs and avocado soybean unsaponifiables are available in Europe but not in the USA. It is also important to consider the number of guidelines, which considered any particular modality of therapy in ones interpretation of the reliability of the strength of agreement for that treatment. Clearly, the confidence one can have in the universal recommendation for exercise, where this modality of treatment was considered and endorsed in 21/21 guidelines, is likely to be greater than the confidence one has in the recommendation for knee fusion, which was only considered and endorsed in 2/2 guidelines.
It was also apparent that some of the core set of universally recommended therapies were not supported by evidence from RCTs. For example, while exercise of various types was supported by SR of RCTs (level Ia), total joint replacement was only supported by uncontrolled or cohort studies (level III) and the recommendations for knee aspiration and knee fusion were based on expert opinion (level IV). The extent to which RCTs should be the gold standard for the recommendation of all treatments has been the subject of previous discussion and controversy71, 72. Nevertheless, the level of research evidence and clinical effectiveness have been important considerations in the development of recent guidelines for the treatment of knee and hip OA17, 18 and in the development of the OARSI recommendations. Clearly guidelines based on recommendations for treatments for which there is proven evidence of benefit should at least have the potential for improving clinical outcomes and the quality of health care for patients, although success is certainly not guaranteed and evidence-based guidelines are only one option for improving the quality of health care.
A pilot survey of the perceived usefulness of the treatment modalities addressed by the existing guidelines was conducted among physicians and other health care professionals attending a New York University – OARSI Rheumatology Symposium in 2006. The purpose of the survey was to collect the users' opinions on the usefulness of current treatment guidelines. The usefulness of each recommended treatment modality was assessed by the participants using a 5-point categorical scale (not useful, slightly useful, moderately useful, very useful and absolutely essential). Votes (%) on “very useful or absolutely essential” were calculated. Of 19 participants who completed the questionnaire (four general physicians, eight rheumatologists, one physiotherapist, one orthopaedic surgeon, one pharmacist and four other health professionals), 94% perceived total joint replacement to be very useful or essential therapy for both knee OA and hip OA. Combination therapy was judged to be very useful or essential by 79% for knee OA and 72% for hip OA. Weight reduction was perceived to be more useful for knee than hip OA by 68%, whereas NSAIDs, NSAID plus PPIs, COX-2 inhibitors, self-management, education and exercise were considered useful for both hip and knee OA. Although this survey was far from being truly representative of all potential guideline users and only involved a very small number of participants, most of whom were from the United States, the views expressed about the usefulness of various modalities of treatment were at least consistent with the appraisal of existing guidelines that has led to the definition of a tentative core set of recommended treatment modalities. It also points to a possible way of assessing the potential applicability of any future recommendations for other modalities of therapy being considered as additions to this core set.
Quality of existing guidelines
The methodology involved in the development of treatment guidelines for OA has evolved considerably in the last decade. Between the publication of the first guidelines for the treatment of OA by the Royal College of Physicians in 199349 and the publication of the EULAR recommendations in 200518, the paradigm has shifted from purely opinion-based guidelines49 to entirely evidence-based guidelines such as the Prodigy Guidance34 and subsequently to hybrid guidelines based on both research evidence and clinical expertise such as the EULAR recommendations17, 18. However, no attempt had been made to try and assess the quality of these guidelines. We have therefore used the AGREE instrument to evaluate the quality of all existing guidelines for scope and purpose, stakeholder participation, methodological rigour, clarity, applicability, editorial independence and overall quality22. Overall quality was better in evidence-based than opinion-based guidelines, and significantly better still in the hybrid guidelines that combined research evidence with expert opinion (Fig. 1). This is mainly attributable to the improved scores for scope and purpose (P=0.007), rigour of development (P<0.001) and editorial independence (P=0.013) in the hybrid guidelines (Table III). There is a tendency for evidence-based guidelines to have lower applicability, although the differences are not statistically significant (Table III). This may, in part, reflect the gap that exists between RCTs which demonstrate that an intervention works (“efficacy”) and how often and well the intervention works in clinical practise (“clinical effectiveness”). Hybrid guidelines can be expected to demonstrate improved applicability as clinical expertise can temper the rigidity of research data and close the gap between research and clinical practise.
In the development of hybrid guidelines by the EULAR OA Task Force, expert consensus on the most important propositions was followed by a systematic search for published supporting research evidence, prior to assigning a strength and confidence of recommendation for each treatment proposition. These were based on combined consideration of the research evidence and clinical expertise after also considering risks and benefits, including potential adverse effects and the cost of each treatment modality18. This method is clinically driven and evidence supported. The sequence of steps has been modified slightly for the development of the OARSI Treatment Guidelines. An initial SR of research evidence was followed by the development of expert consensus based on a combined consideration of the research evidence and the clinical expertise of the members of the committee. This was then followed by assignment of strength and confidence of recommendation for each proposition as before. This current method is evidence-driven and clinically supported. Another important difference in the methodology used in the development of the OARSI recommendations has been that the committee has not arbitrarily restricted the number of treatment options that it would consider, as was the case in the development of the EULAR guidelines17, 18.
There are a number of limitations to this study.
Firstly it was inevitably necessary to set fixed timelines for the literature search, i.e., from January 2002 to January 2006. Evidence before this time was obtained from the EULAR SR. For technical reasons it has not been possible, to date, to pool the data, so that the SRs of the relevant scientific literature before January 2002 and from January 2002 to January 2006 remain as two separate data sets. Evidence that has been published after January 2006 has yet to be systematically reviewed. There have been a number of new studies published after 31 January 2006, examples are those for glucosamine, chondroitin, diacerhein and self-management73, 74, 75, 76, 77. It has not been possible to update the SR following the Delphi exercise, which is described in detail in the second part of this report. The methods used to develop the guideline involved undertaking an SR of the research evidence to inform and assist in the development of the expert consensus. Any new evidence or proposals for changes in the consensus recommendations after completion of the Delphi exercise should properly be considered in the context of the full evidence and propositions. This would have required another systematic literature search for all evidence and a further Delphi exercise, which would not have been feasible within the timeframe. Sensitivity analysis78 was therefore undertaken to examine whether these recently published studies would alter any of the evidence-based conclusions (Table VIII). For example, the results of two further RCTs for glucosamine hydrochloride, The National Institutes of Health Glucosamine/Chondroitin Arthritis Intervention (GAIT) Trail and sulphate (GUIDE) Trial have recently been published74, 75. The addition of the data from these two studies to the main body of trial outcomes did not alter ESs for glucosamine sulphate or hydrochloride significantly. Treatment with glucosamine sulphate remained superior to placebo while treatment with glucosamine hydrochloride was not. However, following the addition of the new data on chondroitin sulphate from the GAIT study to the results of the earlier RCTs, treatment with chondroitin sulphate was no longer superior to placebo74, 76 (Table VIII). However, there are a number of studies that have been reported in 2007 that have not been included, two examples are trials of chondroitin sulphate and of weight reduction which were published after the analyses and discussion for this manuscript were completed79, 80. Treatment with diacerhein was the subject of a recent Cochrane SR77. The calculations of ES and RR were similar to those found in this study (Table VIII). No attempt has been made to pool the data as the majority of trials included in the Cochrane review are already included in our main analysis81, 82, 83, 84, 85. A new RCT of self-management (class training package plus educational booklets) vs educational booklets alone did not show any difference for the WOMAC pain scores between groups73. Unfortunately, numerical data were not available and a sensitivity test could not be conducted.
|Modality||Outcome measure(s)||Point estimate (95% CI)|
|Data 2002–2006||Data 2006–||Pooled|
|Glucosamin sulphate||ESpain||0.68 (0.32, 1.04)||0.26 (−0.01, 0.54)75||0.45 (0.04, 0.86)|
|Glucosamin hydrocloride||ESpain||0.13 (−0.27, 0.53)||−0.03 (−0.18, 0.13)74||−0.01 (−0.15, 0.14)|
|Chondroitin sulphate||ESpain||0.52 (0.37, 0.67)||−0.02 (−0.18, 0.14)74||0.30 (−0.10, 0.70)|
|0.42 (0.04, 0.79)76|
|Diacerhein||ESpain||0.22 (0.01, 0.42)||0.22 (0.01, 0.42)77||NA|
|RRdiarrhoea||3.98 (2.90, 5.47)||3.81 (2.54, 5.71)77|
|Self-management||ESpain||0.06 (0.02, 0.10)||No difference for WOMAC pain73|
NA: not applicable as the new study is an updated SR.
As it is of course almost certain that additional studies, which may be relevant to the analyses and conclusions contained in this report, will be published in due course, we plan to review accumulating evidence annually, and to formally update the guidelines as required within 3–5 years.
Secondly, research evidence can be prone to publication bias. Although we have searched Cochrane library, unpublished/unregistered trials cannot be comprehensively assessed. We would therefore encourage investigators to register any trials that are being undertaken or planned.
Thirdly, caution must be taken when looking for cross-treatment comparisons unless the evidence has been obtained from a direct comparison. Most of the evidences summarised in this report are from placebo controlled studies. Placebo effects may vary across trials and indirect comparison can be misleading86. In addition, there are numerous differences between trials such as differences in study period, severity of disease, age, gender and co-morbidities, etc. For example it is not appropriate to make a direct comparison of ESs between electrotherapy (ES=0.77, 95% CI 0.36, 1.17) and NSAIDs (ES=0.32, 95% CI 0.24, 0.39) and to draw the conclusion that electrotherapy is more effective than NSAIDs.
Finally, evidence was selected sequentially according to the evidence hierarchy (Table I) and the quality of the studies, and only the best available evidence was considered. Whether this is an adequate approach is open to discussion. An MA is not necessarily superior to a large scale well-conducted RCT87, and RCTs are not necessarily better than observational studies88. Differences in the underlying populations being examined may also impact the results of a study.
In summary, a critical appraisal of existing treatment guidelines across countries and regions has identified a core set of treatments for the management of hip and knee OA. The quality and applicability of these guidelines increased when research evidence and expert opinions were combined. The study suggests that there is room for improvement in the quality and applicability of guidelines for the management of hip and knee OA in the future. Regular SR of research evidence and update of recommendations are important to ensure that guidelines remain current.