Meta-analysis: Its strengths and limitations

Cleveland Clinic Journal of Medicine. 2008 June;75(6):431-439

June 1, 2008|Cleveland Clinic Journal of Medicine

Esteban Walker, PhD;Adrian V. Hernandez, MD, PhD;Michael W. Kattan, PhD

ABSTRACTNowadays, doctors face an overwhelming amount of information, even in narrow areas of interest. In response, reviews designed to summarize the large volumes of information are frequently published. When a review is done systematically, following certain criteria, and the results are pooled and analyzed quantitatively, it is called a meta-analysis. A well-designed meta-analysis can provide valuable information for researchers, policy-makers, and clinicians. However, there are many critical caveats in performing and interpreting them, and thus many ways in which meta-analyses can yield misleading information.

KEY POINTS

Meta-analysis is an analytical technique designed to summarize the results of multiple studies.
By combining studies, a meta-analysis increases the sample size and thus the power to study effects of interest.
There are many caveats in performing a valid meta-analysis, and in some cases a meta-analysis is not appropriate and the results can be misleading.

Search bias: Identifying relevant studies

Even in the ideal case that all relevant studies were available (ie, no publication bias), a faulty search can miss some of them. In searching databases, much care should be taken to assure that the set of key words used for searching is as complete as possible. This step is so critical that most recent meta-analyses include the list of key words used. The search engine (eg, PubMed, Google) is also critical, affecting the type and number of studies that are found.⁷ Small differences in search strategies can produce large differences in the set of studies found.⁸

Selection bias: Choosing the studies to be included

The identification phase usually yields a long list of potential studies, many of which are not directly relevant to the topic of the meta-analysis. This list is then subject to additional criteria to select the studies to be included. This critical step is also designed to reduce differences among studies, eliminate replication of data or studies, and improve data quality, and thus enhance the validity of the results.

To reduce the possibility of selection bias in this phase, it is crucial for the criteria to be clearly defined and for the studies to be scored by more than one researcher, with the final list chosen by consensus.^9,10 Frequently used criteria in this phase are in the areas of:

Objectives
Populations studied
Study design (eg, experimental vs observational)
Sample size
Treatment (eg, type and dosage)
Criteria for selection of controls
Outcomes measured
Quality of the data
Analysis and reporting of results
Accounting and reporting of attrition rates
Length of follow-up
When the study was conducted.

The objective in this phase is to select studies that are as similar as possible with respect to these criteria. It is a fact that even with careful selection, differences among studies will remain. But when the dissimilarities are large it becomes hard to justify pooling the results to obtain a “unified” conclusion.

In some cases, it is particularly difficult to find similar studies,^10,11 and sometimes the discrepancies and low quality of the studies can prevent a reasonable integration of results. In a systematic review of advanced lung cancer, Nicolucci et al¹² decided not to pool the results, in view of “systematic qualitative inadequacy of almost all trials” and lack of consistency in the studies and their methods. Marsoni et al¹³ came to a similar conclusion in attempting to summarize results in advanced ovarian cancer.

Stratification is an effective way to deal with inherent differences among studies and to improve the quality and usefulness of the conclusions. An added advantage to stratification is that insight can be gained by investigating discrepancies among strata.

There are many ways to create coherent subgroups of studies. For example, studies can be stratified according to their “quality,” assigned by certain scoring systems. Commonly used systems award points on the basis of how patients were selected and randomized, the type of blinding, the dropout rate, the outcome measurement, and the type of analysis (eg, intention-to-treat). However, these criteria, and therefore the scores, are somewhat subjective. Moher et al¹⁴ expand on this issue.

Large differences in sample sizes among studies are not uncommon and can cause problems in the analysis. Depending on the type of model used (see below), meta-analyses combine results based on the size of each study, but when the studies vary significantly in size, the large studies can still have an unduly large influence on the results. Stratifying by sample size is done sometimes to verify the stability of the results.⁴

On the other hand, the presence of dissimilarities among studies can have advantages by increasing the generalizability of the conclusions. Berlin and Colditz¹ point out that “we gain strength in inference when the range of patient characteristics has been broadened by replicating findings in studies with populations that vary in age range, geographic region, severity of underlying illness, and the like.”

Funnel plot: Detecting biases in the identification and selection of studies

The funnel plot is a technique used to investigate the possibility of biases in the identification and selection phases. In a funnel plot the size of the effect (defined as a measure of the difference between treatment and control) in each study is plotted on the horizontal axis against standard error¹⁵ or sample size⁹ on the vertical axis. If there are no biases, the graph will tend to have a symmetrical funnel shape centered in the average effect of the studies. When negative studies are missing, the graph shows lack of symmetry.

Funnel plots are appealing because they are simple, but their objective is to detect a complex effect, and they can be misleading. For example, lack of symmetry in a funnel plot can also be caused by heterogeneity in the studies.¹⁶ Another problem with funnel plots is that they are difficult to interpret when the number of studies is small. In some cases, however, the researcher may not have any option but to perform the analysis and report the presence of bias.¹¹

Figure 1. Top, a funnel plot of studies of anticoagulant prophylaxis that measured the outcome of symptomatic pulmonary embolism. The plot is asymmetrical, suggesting that small studies in which prophylaxis was associated with an increased risk are missing. Bottom, a funnel plot of studies with the outcome of major bleeding is symmetrical, suggesting absence of selection bias.

Dentali et al¹⁷ conducted a meta-analysis to study the effect of anticoagulant treatment to prevent symptomatic venous thromboembolism in hospitalized patients. The conclusion was that the treatment was effective to prevent symptomatic pulmonary thromboembolism, with no significant increase in major bleeding. Figure 1 shows the funnel plots for the two outcomes. Dentali et al¹⁷ concluded that the lack of symmetry in the top plot suggests a lack of inclusion of small studies showing an increase in the risk of pulmonary thromboembolism, and thus, bias. The bottom plot shows the symmetry of the funnel plot for major bleeding, suggesting absence of bias.

References

Download Article