Meta-analysis: Its strengths and limitations

Cleveland Clinic Journal of Medicine. 2008 June;75(6):431-439

June 1, 2008|Cleveland Clinic Journal of Medicine

Esteban Walker, PhD;Adrian V. Hernandez, MD, PhD;Michael W. Kattan, PhD

ABSTRACTNowadays, doctors face an overwhelming amount of information, even in narrow areas of interest. In response, reviews designed to summarize the large volumes of information are frequently published. When a review is done systematically, following certain criteria, and the results are pooled and analyzed quantitatively, it is called a meta-analysis. A well-designed meta-analysis can provide valuable information for researchers, policy-makers, and clinicians. However, there are many critical caveats in performing and interpreting them, and thus many ways in which meta-analyses can yield misleading information.

KEY POINTS

Meta-analysis is an analytical technique designed to summarize the results of multiple studies.
By combining studies, a meta-analysis increases the sample size and thus the power to study effects of interest.
There are many caveats in performing a valid meta-analysis, and in some cases a meta-analysis is not appropriate and the results can be misleading.

ANALYSIS OF DATA

There are specific statistical techniques that are used in meta-analysis to analyze and integrate the information. The data from the individual studies can be analyzed using either of two models: fixed effects or random effects.

The fixed-effects model assumes that the treatment effect is the same across studies. This common effect is unknown, and the purpose of the analysis is to estimate it with more precision than in the individual studies.

The random-effects model, on the other hand, assumes that the treatment effect is not the same across studies. The goal is to estimate the average effect in the studies.

In the fixed-effects model, the results of individual studies are pooled using weights that depend on the sample size of the study, whereas in the random-effects model each study is weighted equally. Due to the heterogeneity among studies, the random-effects model yields wider confidence intervals.

Both models have pros and cons. In many cases, the assumption that the treatment effect is the same in all the studies is not tenable, and the random-effects model is preferable. When the effect of interest is large, the results of both models tend to agree, particularly when the studies are balanced (ie, they have a similar number of patients in the treatment group as in the control group) and the study sizes are similar. But when the effect is small or when the level of heterogeneity of the studies is high, the result of the meta-analysis is likely to depend on the model used. In those cases, the analysis should be done and presented using both models.

It is highly desirable for a meta-analysis to include a sensitivity analysis to determine the “robustness” of the results. Two common ways to perform sensitivity analysis are to analyze the data using various methods and to present the results when some studies are removed from the analysis.²⁶ If these actions cause serious changes in the overall results, the credibility of the results is compromised.

The strength of meta-analysis is that, by pooling many studies, the effective sample size is greatly increased, and consequently more variables and outcomes can be examined. For example, analysis in subsets of patients and regression analyses⁹ that could not be done in individual trials can be performed in a meta-analysis.

A word of caution should be given with respect to larger samples and the possibility of multiple analyses of the data in meta-analysis. Much care must be exercised when examining the significance of effects that are not considered prior to the meta-analysis. The testing of effects suggested by the data and not planned a priori (sometimes called “data-mining”) increases considerably the risk of false-positive results. One common problem with large samples is the temptation to perform many so-called “subgroup analyses” in which subgroups of patients formed according to multiple baseline characteristics are compared.²⁷ The best way to minimize the possibility of false-positive results is to determine the effects to be tested before the data are collected and analyzed. Another method is to adjust the P value according to the number of analyses performed. In general, post hoc analyses should be deemed exploratory, and the reader should be made aware of this fact in order to judge the validity of the conclusion.

META-ANALYSIS OF RARE EVENTS

Lately, meta-analysis has been used to analyze outcomes that are rare and that individual studies were not designed to test. In general, the sample size of individual studies provides inadequate power to test rare outcomes. Adverse events are prime examples of important rare outcomes that are not always formally analyzed statistically. The problem in the analysis of adverse events is their low incidence. Paucity of events causes serious problems in any statistical analysis (see Shuster et al²⁸). The reason is that, with rare events, small changes in the data can cause dramatic changes in the results. This problem can persist even after pooling data from many studies. Instability of results is also exacerbated by the use of relative measures (eg, relative risk and odds ratio) instead of absolute measures of risk (eg, risk difference).

In a controversial meta-analysis, Nissen and Wolski⁴ combined 42 studies to examine the effect of rosiglitazone (Avandia) on the risk of myocardial infarction and death from cardiovascular causes. The overall estimated incidence of myocardial infarction in the treatment groups was 0.006 (86/14,376), or 6 in 1,000. Furthermore, 4 studies did not have any occurrences in either group, and 2 of the 42 studies accounted for 28.4% of the patients in the study.

Using a fixed-effect model, the odds ratio was 1.42, ie, the odds of myocardial infarction was 42% higher in patients using rosiglitazone, and the difference was statistically significant (95% confidence interval 1.03–1.98). Given the low frequency of myocardial infarction, this translates into an increase of only 1.78 myocardial infarctions per 1,000 patients (from 4.22 to 6 per 1,000). Furthermore, when the data were analyzed using other methods or if the two large studies were removed, the effect became nonsignificant.²⁹ Nissen and Wolski’s study⁴ is valuable and raises an important issue. However, the medical community would have been better served if a sensitivity analysis had been presented to highlight the fragility of the conclusions.

References

Download Article