## Introduction to Meta Analysis (I)

“Since meta-analysis is a relatively new field, many people, including those who actually use meta-analysis in their work, have not had the opportunity to learn about it systematically. We hope that this volume will provide a framework that allows them to understand the logic of meta-analysis, as well as how to apply and interpret meta-analytic procedures properly.

This book is aimed at researchers, clinicians, and statisticians. Our approach is primarily conceptual. The reader will be able to skip the formulas and still understand, for example, the differences between fixed-effect and random-effects analysis, and the mechanisms used to assess the dispersion in effects from study to study. However, for those with a statistical orientation, we include all the relevant formulas, along with worked examples. […] This volume is intended for readers from various substantive fields, including medicine, epidemiology, social science, business, ecology, and others. While we have included examples from many of these disciplines, the more important message is that meta-analytic methods that may have developed in any one of these fields have application to all of them.”

…

I’ve been reading this book and I like it so far – I’ve read about the topic before but I’ve been missing a textbook on this topic, and this one is quite good so far (I’ve read roughly half of it so far). Below I have added some observations from the first thirteen chapters of the book:

…

“Meta-analysis refers to the statistical synthesis of results from a series of studies. While the statistical procedures used in a meta-analysis can be applied to any set of data, the synthesis will be meaningful only if the studies have been collected systematically. This could be in the context of a systematic review, the process of systematically locating, appraising, and then synthesizing data from a large number of sources. Or, it could be in the context of synthesizing data from a select group of studies, such as those conducted by a pharmaceutical company to assess the efficacy of a new drug. If a treatment effect (or effect size) is consistent across the series of studies, these procedures enable us to report that the effect is robust across the kinds of populations sampled, and also to estimate the magnitude of the effect more precisely than we could with any of the studies alone. If the treatment effect varies across the series of studies, these procedures enable us to report on the range of effects, and may enable us to identify factors associated with the magnitude of the effect size.”

“For systematic reviews, a clear set of rules is used to search for studies, and then to determine which studies will be included in or excluded from the analysis. Since there is an element of subjectivity in setting these criteria, as well as in the conclusions drawn from the meta-analysis, we cannot say that the systematic review is entirely objective. However, because all of the decisions are specified clearly, the mechanisms are transparent. A key element in most systematic reviews is the statistical synthesis of the data, or the meta-analysis. Unlike the narrative review, where reviewers implicitly assign some level of importance to each study, in meta-analysis the weights assigned to each study are based on mathematical criteria that are specified in advance. While the reviewers and readers may still differ on the substantive meaning of the results (as they might for a primary study), the statistical analysis provides a transparent, objective, and replicable framework for this discussion. […] If the entire review is performed properly, so that the search strategy matches the research question, and yields a reasonably complete and unbiased collection of the relevant studies, then (providing that the included studies are themselves valid) the meta-analysis will also be addressing the intended question. On the other hand, if the search strategy is flawed in concept or execution, or if the studies are providing biased results, then problems exist in the review that the meta-analysis cannot correct.”

“Meta-analyses are conducted for a variety of reasons […] The purpose of the meta-analysis, or more generally, the purpose of any research synthesis has implications for when it should be performed, what model should be used to analyze the data, what sensitivity analyses should be undertaken, and how the results should be interpreted. Losing sight of the fact that meta-analysis is a tool with multiple applications causes confusion and leads to pointless discussions about what is the right way to perform a research synthesis, when there is no single right way. It all depends on the purpose of the synthesis, and the data that are available.”

“The effect size, a value which reflects the magnitude of the treatment effect or (more generally) the strength of a relationship between two variables, is the unit of currency in a meta-analysis. We compute the effect size for each study, and then work with the effect sizes to assess the consistency of the effect across studies and to compute a summary effect. […] The summary effect is nothing more than the weighted mean of the individual effects. However, the mechanism used to assign the weights (and therefore the meaning of the summary effect) depends on our assumptions about the distribution of effect sizes from which the studies were sampled. Under the fixed-effect model, we assume that all studies in the analysis share the same true effect size, and the summary effect is our estimate of this common effect size. Under the random-effects model, we assume that the true effect size varies from study to study, and the summary effect is our estimate of the mean of the distribution of effect sizes. […] A key theme in this volume is the importance of assessing the dispersion of effect sizes from study to study, and then taking this into account when interpreting the data. If the effect size is consistent, then we will usually focus on the summary effect, and note that this effect is robust across the domain of studies included in the analysis. If the effect size varies modestly, then we might still report the summary effect but note that the true effect in any given study could be somewhat lower or higher than this value. If the effect varies substantially from one study to the next, our attention will shift from the summary effect to the dispersion itself.”

“During the time period beginning in1959 and ending in 1988 (a span of nearly 30 years) there were a total of 33 randomized trials performed to assess the ability of streptokinase to prevent death following a heart attack. […] The trials varied substantially in size. […] Of the 33 studies, six were statistically significant while the other 27 were not, leading to the perception that the studies yielded conflicting results. […] In 1992 Lau et al. published a meta-analysis that synthesized the results from the 33 studies. […] [They found that] the treatment reduces the risk of death by some 21%. And, this effect was reasonably consistent across all studies in the analysis. […] The narrative review has no mechanism for synthesizing the p-values from the different studies, and must deal with them as discrete pieces of data. In this example six of the studies were statistically significant while the other 27 were not, which led some to conclude that there was evidence against an effect, or that the results were inconsistent […] By contrast, the meta-analysis allows us to combine the effects and evaluate the statistical significance of the summary effect. The p-value for the summary effect [was] p=0.0000008. […] While one might assume that 27 studies failed to reach statistical significance because they reported small effects, it is clear […] that this is not the case. In fact, the treatment effect in many of these studies was actually larger than the treatment effect in the six studies that were statistically significant. Rather, the reason that 82% of the studies were not statistically significant is that these studies had small sample sizes and low statistical power.”

“the [narrative] review will often focus on the question of whether or not the body of evidence allows us to reject the null hypothesis. There is no good mechanism for discussing the magnitude of the effect. By contrast, the meta-analytic approaches discussed in this volume allow us to compute an estimate of the effect size for each study, and these effect sizes fall at the core of the analysis. This is important because the effect size is what we care about. If a clinician or patient needs to make a decision about whether or not to employ a treatment, they want to know if the treatment reduces the risk of death by 5% or 10% or 20%, and this is the information carried by the effect size. […] The p-value can tell us only that the effect is not zero, and to report simply that the effect is not zero is to miss the point. […] The narrative review has no good mechanism for assessing the consistency of effects. The narrative review starts with p-values, and because the p-value is driven by the size of a study as well as the effect in that study, the fact that one study reported a p-value of 0.001 and another reported a p-value of 0.50 does not mean that the effect was larger in the former. The p-value of 0.001 *could* reflect a large effect size but it could also reflect a moderate or small effect in a large study […] The p-value of 0.50 *could* reflect a small (or nil) effect size but could also reflect a large effect in a small study […] This point is often missed in narrative reviews. Often, researchers interpret a nonsignificant result to mean that there is no effect. If some studies are statistically significant while others are not, the reviewers see the results as conflicting. This problem runs through many fields of research. […] By contrast, meta-analysis completely changes the landscape. First, we work with effect sizes (not p-values) to determine whether or not the effect size is consistent across studies. Additionally, we apply methods based on statistical theory to allow that some (or all) of the observed dispersion is due to random sampling variation rather than differences in the true effect sizes. Then, we apply formulas to partition the variance into random error versus real variance, to quantify the true differences among studies, and to consider the implications of this variance.”

“Consider […] the case where some studies report a difference in means, which is used to compute a standardized mean difference. Others report a difference in proportions which is used to compute an odds ratio. And others report a correlation. All the studies address the same broad question, and we want to include them in one meta-analysis. […] we are now dealing with different indices, and we need to convert them to a common index before we can proceed. The question of whether or not it is appropriate to combine effect sizes from studies that used different metrics must be considered on a case by case basis. The key issue is that it only makes sense to compute a summary effect from studies that we judge to be comparable in relevant ways. If we would be comfortable combining these studies if they had used the same metric, then the fact that they used different metrics should not be an impediment. […] When some studies use means, others use binary data, and others use correlational data, we can apply formulas to convert among effect sizes. […] When we convert between different measures we make certain assumptions about the nature of the underlying traits or effects. Even if these assumptions do not hold exactly, the decision to use these conversions is often better than the alternative, which is to simply omit the studies that happened to use an alternate metric. This would involve loss of information, and possibly the systematic loss of information, resulting in a biased sample of studies. A sensitivity analysis to compare the meta-analysis results with and without the converted studies would be important. […] Studies that used different measures may [however] differ from each other in substantive ways, and we need to consider this possibility when deciding if it makes sense to include the various studies in the same analysis.”

“The precision with which we estimate an effect size can be expressed as a standard error or confidence interval […] or as a variance […] The precision is driven primarily by the sample size, with larger studies yielding more precise estimates of the effect size. […] Other factors affecting precision include the study design, with matched groups yielding more precise estimates (as compared with independent groups) and clustered groups yielding less precise estimates. In addition to these general factors, there are unique factors that affect the precision for each effect size index. […] Studies that yield more precise estimates of the effect size carry more information and are assigned more weight in the meta-analysis.”

“Under the fixed-effect model we assume that all studies in the meta-analysis share a common (true) effect size. […] However, in many systematic reviews this assumption is implausible. When we decide to incorporate a group of studies in a meta-analysis, we assume that the studies have enough in common that it makes sense to synthesize the information, but there is generally no reason to assume that they are *identical* in the sense that the true effect size is *exactly the same* in all the studies. […] Because studies will differ in the mixes of participants and in the implementations of interventions, among other reasons, there may be *different effect sizes* underlying different studies. […] One way to address this variation across studies is to perform a *random-effects* meta-analysis. In a random-effects meta-analysis we usually assume that the true effects are normally distributed. […] Since our goal is to estimate the mean of the distribution, we need to take account of two sources of variance. First, there is within-study error in estimating the effect in each study. Second (even if we knew the true mean for each of our studies), there is variation in the true effects across studies. Study weights are assigned with the goal of minimizing both sources of variance.”

“Under the fixed-effect model we assume that the true effect size for all studies is identical, and the only reason the effect size varies between studies is sampling error (error in estimating the effect size). Therefore, when assigning weights to the different studies we can largely ignore the information in the smaller studies since we have better information about the same effect size in the larger studies. By contrast, under the random-effects model the goal is not to estimate one true effect, but to estimate the mean of a distribution of effects. Since each study provides information about a different effect size, we want to be sure that all these effect sizes are represented in the summary estimate. This means that we cannot discount a small study by giving it a very small weight (the way we would in a fixed-effect analysis). The estimate provided by that study may be imprecise, but it is information about an effect that no other study has estimated. By the same logic we cannot give too much weight to a very large study (the way we might in a fixed-effect analysis). […] Under the fixed-effect model there is a wide range of weights […] whereas under the random-effects model the weights fall in a relatively narrow range. […] the relative weights assigned under random effects will be *more balanced* than those assigned under fixed effects. As we move from fixed effect to random effects, extreme studies will lose influence if they are large, and will gain influence if they are small. […] Under the fixed-effect model the only source of uncertainty is the within-study (sampling or estimation) error. Under the random-effects model there is this same source of uncertainty plus an additional source (between-studies variance). It follows that the variance, standard error, and confidence interval for the summary effect will always be larger (or wider) under the random-effects model than under the fixed-effect model […] Under the fixed-effect model the null hypothesis being tested is that there is zero effect in *every study*. Under the random-effects model the null hypothesis being tested is that the *mean effect* is zero. Although some may treat these hypotheses as interchangeable, they are in fact different”

“It makes sense to use the fixed-effect model if two conditions are met. First, we believe that all the studies included in the analysis are functionally identical. Second, our goal is to compute the common effect size for the identified population, and not to generalize to other populations. […] this situation is relatively rare. […] By contrast, when the researcher is accumulating data from a series of studies that had been performed by researchers operating independently, it would be unlikely that all the studies were functionally equivalent. Typically, the subjects or interventions in these studies would have differed in ways that would have impacted on the results, and therefore we should not assume a common effect size. Therefore, in these cases the random-effects model is more easily justified than the fixed-effect model. […] There is one caveat to the above. If the number of studies is very small, then the estimate of the between-studies variance […] will have poor precision. While the random-effects model is still the appropriate model, we lack the information needed to apply it correctly. In this case the reviewer may choose among several options, each of them problematic [and one of which is to apply a fixed effects framework].”

No comments yet.

## Leave a Reply