The AGA Institute process for developing clinical practice guidelines

November 15, 2013 | GI and Hepatology News

Shahnaz Sultan, MD ; Yngve Falck–ytter, MD ; John M. Inadomi, MD

The goal of clinical practice guidelines is multifold; to improve patient care and health outcomes, to reduce inappropriate variations in practice, to promote efficient use of resources, and to help define and inform public policy. Although there has been an increase in the number of guidelines over the past 5 years, uptake and adoption of guidelines has been hindered by a number of factors; notably, lack of transparency, lack of a uniform system to rate evidence that informs the guideline, lack of trust in recommendations, and ineffective management of conflicts of interest. Recognizing these deficiencies, a recent Institute of Medicine report defined standards for the development of high-quality guidelines.⁴

GRADE’s methodologically rigorous framework and binary classification of strong vs. conditional (weak) recommendations provides a clear and actionable direction to patients, clinicians, and policy makers. A strong recommendation means that most patients should receive the recommended course of action, whereas a conditional recommendation means that different choices will be appropriate for different patients (Table 1).

Developing a guideline using GRADE

Defining the clinical question

The first step within the GRADE framework is to formulate the clinical questions to be addressed. This not only helps to define the focus of the guideline but it also outlines the criteria for the search strategy that will be used to identify the body of evidence. The clinical question should include the following four components: patient population, intervention, comparator, and outcome (PICO format).

Consider the following: does daily aspirin for chemoprevention reduce the risk of colorectal cancer (CRC)? This informal question would yield a relatively large number of studies that vary with respect to study design (observational studies vs. randomized controlled trials), patient population (patients with hereditary cancer syndromes vs. a personal history of adenomas), interventions (low-dose aspirin vs. regular-dose aspirin), comparators (aspirin vs. cyclooxygenase-2 inhibitors), as well as outcomes (recurrent adenomas vs. advanced adenomas vs. cancer). A more appropriate clinical question may be the following: in average-risk patients with no prior history of CRC or adenomas, does regular-dose aspirin vs. no aspirin reduce the incidence of CRC? This latter PICO is defined more clearly and translates into a search strategy that may yield more relevant studies.

Not all outcomes are equally important

GRADE categorizes outcomes as critical for decision making, important but not critical (for decision making), and those that are less or not important for decision making. This explicit ranking of outcomes within a hierarchy is important because in GRADE the quality of evidence is determined for each outcome. This is in contrast to other systems of guideline development, in which quality is determined by study type and on a study-by-study basis. For example, in developing a guideline for stool testing for CRC, outcomes such as CRC-related mortality may be considered critical outcomes, incidence of advanced adenomas or minor procedure-related complications may be important, but not critical outcomes, and, finally, incidence of diminutive polyps may be considered less important outcomes.

A high-quality guideline requires a systematic review of the evidence

A systematic search for the evidence should be conducted or identified through sources such as the Cochrane Library or Medline. This systematic review (SR) will provide the data across individual studies used to generate a best estimate of the effect for each outcome. A guideline panel can conduct its own SR or use an existing SR that is deemed to be of sufficient quality.

Grading the quality of the evidence

GRADE defines quality as the extent to which our confidence in an estimate of the treatment effect is adequate to support a particular recommendation. The GRADE system specifies four grades of confidence in the evidence: high, moderate, low, and very low. Explicit criteria are available for rating down or rating up the quality of evidence. Evidence from randomized controlled trials (RCTs) start with a high confidence in the evidence, whereas an initial low confidence in the estimate is typical for evidence from observational studies (Table 1).

If an RCT has major methodologic limitations such as inadequate allocation concealment, lack of blinding, lack of accounting for high losses to follow-up evaluation, failure to perform an intention-to-treat analysis, selective reporting of outcomes, and stopping early for benefit, this may lower the quality of the evidence. In a recent meta-analysis comparing proton pump inhibitor therapy vs. histamine₂-receptor antagonists in critically ill patients at risk for stress-related mucosal bleeding, significant methodologic issues, including lack of reporting of randomization, allocation concealment, and blinding, as well as high loss to follow-up evaluation, were recognized.⁵ The meta-analysis found less gastrointestinal bleeding among those who received proton pump inhibitors (1.3% vs. 6.6%; odds ratio, 0.30; 95% confidence interval, 0.17-0.54), however, because of the many study limitations, we might consider rating down this body of evidence from high to moderate.