Econstudentlog

Medical Statistics (I)

I was more than a little critical of the book in my review on goodreads, and the review is sufficiently detailed that I thought it would be worth including it in this post. Here’s what I wrote on goodreads (slightly edited to take full advantage of the better editing options on wordpress):

“The coverage is excessively focused on significance testing. The book also provides very poor coverage of model selection topics, where the authors not once but repeatedly recommend employing statistically invalid approaches to model selection (the authors recommend using hypothesis testing mechanisms to guide model selection, as well as using adjusted R-squared for model selection decisions – both of which are frankly awful ideas, for reasons which are obvious to people familiar with the field of model selection. “Generally, hypothesis testing is a very poor basis for model selection […] There is no statistical theory that supports the notion that hypothesis testing with a fixed α level is a basis for model selection.” “While adjusted R2 is useful as a descriptive statistic, it is not useful in model selection” – quotes taken directly from Burnham & Anderson’s book Model Selection and Multi-Model Inference: A Practical Information-Theoretic Approach).

The authors do not at any point in the coverage even mention the option of using statistical information criteria to guide model selection decisions, and frankly repeatedly recommend doing things which are known to be deeply problematic. The authors also cover material from Borenstein and Hedges’ meta-analysis text in the book, yet still somehow manage to give poor advice in the context of meta-analysis along similar lines (implicitly advising people to base model decisions within the context of whether to use fixed effects or random effects on the results of heterogeneity tests, despite this approach being criticized as problematic in the formerly mentioned text).

Basic and not terrible, but there are quite a few problems with this text.”

I’ll add a few more details about the above-mentioned problems before moving on to the main coverage. As for the model selection topic I refer specifically to my coverage of Burnham and Anderson’s book here and here – these guys spent a lot of pages talking about why you shouldn’t do what the authors of this book recommend, and I’m sort of flabbergasted medical statisticians don’t know this kind of stuff by now. To people who’ve read both these books, it’s not really in question who’s in the right here.

I believe part of the reason why I was very annoyed at the authors at times was that they seem to promote exactly a sort of blind unthinking hypothesis-testing approach to things that is unfortunately very common – the entire book is saturated with hypothesis testing stuff, which means that many other topics are woefully insufficiently covered. The meta-analysis example is probably quite illustrative; the authors spend multiple pages on study heterogeneity and how to deal with it, but the entire coverage there is centered around the discussion of a most-likely underpowered test, the result of which should perhaps in the best case scenario direct the researcher’s attention to topics he should be have been thinking carefully about from the very start of his data analysis. You don’t need to quote many words from Borenstein and Hedges (here’s a relevant link) to get to the heart of the matter here:

“It makes sense to use the fixed-effect model if two conditions are met. First, we believe that all the studies included in the analysis are functionally identical. Second, our goal is to compute the common effect size for the identified population, and not to generalize to other populations. […] this situation is relatively rare. […] By contrast, when the researcher is accumulating data from a series of studies that had been performed by researchers operating independently, it would be unlikely that all the studies were functionally equivalent. Typically, the subjects or interventions in these studies would have differed in ways that would have impacted on the results, and therefore we should not assume a common effect size. Therefore, in these cases the random-effects model is more easily justified than the fixed-effect model.

A report should state the computational model used in the analysis and explain why this model was selected. A common mistake is to use the fixed-effect model on the basis that there is no evidence of heterogeneity. As [already] explained […], the decision to use one model or the other should depend on the nature of the studies, and not on the significance of this test [because the test will often have low power anyway].”

Yet these guys spend their efforts here talking about a test that is unlikely to yield useful information and which if anything probably distracts the reader from the main issues at hand; are the studies functionally equivalent? Do we assume there’s one (‘true’) effect size, or many? What do those coefficients we’re calculating actually mean? The authors do in fact include a lot of cautionary notes about how to interpret the test, but in my view all this means is that they’re devoting critical pages to peripheral issues – and perhaps even reinforcing the view that the test is important, or why else would they spend so much effort on it? – rather than promote good thinking about the key topics at hand.

Anyway, enough of the critical comments. Below a few links related to the first chapter of the book, as well as some quotes.

Declaration of Helsinki.
Randomized controlled trial.
Minimization (clinical trials).
Blocking (statistics).
Informed consent.
Blinding (RCTs). (…related xkcd link).
Parallel study. Crossover trial.
Zelen’s design.
Superiority, equivalence, and non-inferiority trials.
Intention-to-treat concept: A review.
Case-control study. Cohort study. Nested case-control study. Cross-sectional study.
Bradford Hill criteria.
Research protocol.
Sampling.
Type 1 and type 2 errors.
Clinical audit. A few quotes on this topic:

“‘Clinical audit’ is a quality improvement process that seeks to improve the patient care and outcomes through systematic review of care against explicit criteria and the implementation of change. Aspects of the structures, processes and outcomes of care are selected and systematically evaluated against explicit criteria. […] The aim of audit is to monitor clinical practice against agreed best practice standards and to remedy problems. […] the choice of topic is guided by indications of areas where improvement is needed […] Possible topics [include] *Areas where a problem has been identified […] *High volume practice […] *High risk practice […] *High cost […] *Areas of clinical practice where guidelines or firm evidence exists […] The organization carrying out the audit should have the ability to make changes based on their findings. […] In general, the same methods of statistical analysis are used for audit as for research […] The main difference between audit and research is in the aim of the study. A clinical research study aims to determine what practice is best, whereas an audit checks to see that best practice is being followed.”

A few more quotes from the end of the chapter:

“In clinical medicine and in medical research it is fairly common to categorize a biological measure into two groups, either to aid diagnosis or to classify an outcome. […] It is often useful to categorize a measurement in this way to guide decision-making, and/or to summarize the data but doing this leads to a loss of information which in turn has statistical consequences. […] If a continuous variable is used for analysis in a research study, a substantially smaller sample size will be needed than if the same variable is categorized into two groups […] *Categorization of a continuous variable into two groups loses much data and should be avoided whenever possible *Categorization of a continuous variable into several groups is less problematic”

“Research studies require certain specific data which must be collected to fulfil the aims of the study, such as the primary and secondary outcomes and main factors related to them. Beyond these data there are often other data that could be collected and it is important to weigh the costs and consequences of not collecting data that will be needed later against the disadvantages of collecting too much data. […] collecting too much data is likely to add to the time and cost to data collection and processing, and may threaten the completeness and/or quality of all of the data so that key data items are threatened. For example if a questionnaire is overly long, respondents may leave some questions out or may refuse to fill it out at all.”

Stratified samples are used when fixed numbers are needed from particular sections or strata of the population in order to achieve balance across certain important factors. For example a study designed to estimate the prevalence of diabetes in different ethnic groups may choose a random sample with equal numbers of subjects in each ethnic group to provide a set of estimates with equal precision for each group. If a simple random sample is used rather than a stratified sample, then estimates for minority ethnic groups may be based on small numbers and have poor precision. […] Cluster samples may be chosen where individuals fall naturally into groups or clusters. For example, patients on a hospital wards or patients in a GP practice. If a sample is needed of these patients, it may be easier to list the clusters and then to choose a random sample of clusters, rather than to choose a random sample of the whole population. […] Cluster sampling is less efficient statistically than simple random sampling […] the ICC summarizes the extent of the ‘clustering effect’. When individuals in the same cluster are much more alike than individuals in different clusters with respect to an outcome, then the clustering effect is greater and the impact on the required sample size is correspondingly greater. In practice there can be a substantial effect on the sample size even when the ICC is quite small. […] As well as considering how representative a sample is, it is important […] to consider the size of the sample. A sample may be unbiased and therefore representative, but too small to give reliable estimates. […] Prevalence estimates from small samples will be imprecise and therefore may be misleading. […] The greater the variability of a measure, the greater the number of subjects needed in the sample to estimate it precisely. […] the power of a study is the ability of the study to detect a difference if one exists.”

April 9, 2018 Posted by | Books, Epidemiology, Medicine, Statistics | Leave a comment