Building evidence-based medicine skills in gynecology

OBG Management. 2015 May;27(5):1e-7e.

May 12, 2015 | OBG Management

David Rahn, MD ; Vivian Sung, MD, MPH ; And Ethan Balk, MD, MPH

	Dr. Rahn is Associate Professor, Department of Obstetrics & Gynecology, Division of Female Pelvic Medicine & Reconstructive Surgery, University of Texas Southwestern Medical Center, Dallas.
	Dr. Sung is Associate Professor, Obstetrics and Gynecology, The Warren Alpert Medical School of Brown University, and Research Director, Division of Urogynecology and Reconstructive Pelvic Surgery, Women & Infants Hospital of Rhode Island, Providence.
	Dr. Balk is Assistant Professor (Research), Center for Evidence-Based Medicine, and Associate Director, Brown Evidence-based Practice Center, Brown University School of Public Health, Providence, Rhode Island.

Deciphering the evidence, and emerging with the best management approach, requires knowing the basics of study design and interpretation as well as the intricacies of patient interaction. Here, a primer.

In thIS article

RCTs: The good, the bad, and the ugly
Systematic reviews: What, why, and how?
Assessing harms
Applying the evidence and your expertise to your patient

Surrogate outcomes
Outcomes measured in a trial should be relevant, easy to interpret and diagnose, sensitive to treatment differences, and measurable within a reasonable period of time. However, these characteristics are not always achievable for important clinical outcomes in an RCT. Therefore, a surrogate outcome may take the place of the true clinical efficacy measurement.

For example, in studies of interventions for infertility in patients with polycystic ovary syndrome (PCOS), common surrogates to the “true” desired outcome of a healthy live birth may include ovulation, implantation, or pregnancy rates. These surrogate outcomes may correlate with live birth but clearly ignore other factors extrinsic and intrinsic to PCOS that affect the chance for a healthy term delivery; the possible increased risk for miscarriage in PCOS; and increased risks of other pregnancy complications, such as preeclampsia and gestational diabetes.

Similarly, many trials of oral contraceptives that aim to study the clinical endpoint of pulmonary embolism or venous thromboembolism, which are rare events, instead use the surrogates of results of coagulation tests or levels of sex hormone-binding globulin. Clearly, caution must be exercised when interpreting studies that use surrogate outcomes. As the clinician, you must recognize that a change in a biologic or physical measurement may not be clinically relevant. Some judgment is required about causal pathways: The less that is known about the causal pathway of a disease, the less confident one should be in any surrogate outcome.

Finally, clinicians also must recognize that a valid surrogate for one treatment may not be valid for another treatment or another population.⁵ For example, ovulation inhibition would be an appropriate surrogate endpoint for contraceptive efficacy for a method that reliably prevents ovulation; however, this would not be a good surrogate outcome to evaluate the progestin-only pill, which fails to inhibit ovulation completely and yet is highly effective in contraceptive trials.

Avoiding pitfalls with subgroup analyses
It is common, particularly in large RCTs, to evaluate treatment effects for a specific endpoint in a subgroup of patients included in the trial. The goal is to determine whether the findings of the larger study apply more or less to a specific patient (who may differ from the total population by some important characteristic, such as age, weight, parity, or menopausal or smoking status). The variability in study results when stratified by these patient factors is known as heterogeneity of treatment effect, which may be quantitative or qualitative.⁶

In the former, one treatment is always better than the other, although by varying degrees depending on the subgroup. (For example, a stronger effect could be seen in those aged 65 and younger than in those older than 65.) In the latter, the treatment fares better than the comparator in one subgroup but worse or no different for another subgroup. In either case, the appropriate statistical tool to identify heterogeneity of treatment effect is a test for interaction between the characteristic and the treatment effect, rather than claiming heterogeneity on the basis of separate tests of treatment effects within the different subpopulations.

One problem with dividing the original population into smaller subpopulations is that the number of participants decreases—thus there is less power, or less statistical strength, to identify a treatment effect. More accurately, there is a greater likelihood of a type II error (a false negative) when these small subpopulations have too few patients to demonstrate a clinical treatment effect that actually may exist.

False positives. Paradoxically, another problem with subgroup analyses is a greater chance for false positives due to the multiple statistical testing that is performed. The original study is rarely powered appropriately to do this (see “Error rates in subgroup analyses”). According to Wang and colleagues, “It is common practice to conduct a subgroup analysis for each of several (and often many) baseline characteristics, for each of several endpoints, or for both.”⁷ The more subgroup analyses performed, the more likely that differences found are due to chance only. Unfortunately, in unplanned post hoc analyses, the number of tests performed is often unreported; therefore, the error rates are unknown. There are statistical methods to try and correct for this “multiplicity” problem but, ideally, only a few key subgroup analyses are performed, and they are planned a priori in the original study design. In these cases, the study’s size can be adjusted accordingly. In most instances, findings from subgroup analyses, whether positive or negative, should be considered as “hypothesis generating” and interpreted with caution.

Error rates in subgroup analyses

With “k” independent subgroups and no difference in treatments, the probability of at least one “significant” subgroup (such as a false positive) is 1 – (1-α)^k.

If α = 0.05 and there are k = 10 subgroups, then 1 – (0.95)¹⁰ = 0.40. That is, if 10 subgroup analyses are performed, there is a 40% likelihood that 1 will demonstrate a “significant” difference in treatment effect, even though no difference exists.

References

Download Article

Building evidence-based medicine skills in gynecology

Experts featured in this article

Error rates in subgroup analyses