## Principles of Applied Statistics

“Statistical considerations arise in virtually all areas of science and technology and, beyond these, in issues of public and private policy and in everyday life. While the detailed methods used vary greatly in the level of elaboration involved and often in the way they are described, there is a unity of ideas which gives statistics as a subject both its intellectual challenge and its importance […] In this book we have aimed to discuss the ideas involved in applying statistical methods to advance knowledge and understanding. It is a book not on statistical methods as such but, rather, on how these methods are to be deployed […] We are writing partly for those working as applied statisticians, partly for subject-matter specialists using statistical ideas extensively in their work and partly for masters and doctoral students of statistics concerned with the relationship between the detailed methods and theory they are studying and the effective application of these ideas. Our aim is to emphasize how statistical ideas may be deployed fruitfully rather than to describe the details of statistical techniques.”

…

I gave the book five stars, but as noted in my review on goodreads I’m not sure the word ‘amazing’ is really fitting – however the book had a lot of good stuff and it had very little stuff for me to quibble about, so I figured it deserved a high rating. The book deals to a very large extent with topics which are in some sense common to pretty much all statistical analyses, regardless of the research context; formulation of research questions/hypotheses, data search, study designs, data analysis, and interpretation. The authors spend quite a few pages talking about hypothesis testing but on the other hand no pages talking about statistical information criteria, a topic with which I’m at this point at least reasonably familiar, and I figure if I had been slightly more critical I’d have subtracted a star for this omission – however I have the impression that I’m at times perhaps too hard on non-fiction books on goodreads so I decided not to punish the book for this omission. Part of the reason why I gave the book five stars is also that I’ve sort of wanted to read a book like this one for a while; I think in some sense it’s the first one of its kind I’ve read. I liked the way the book was structured.

Below I have added some observations from the book, as well as a few comments (I should note that I have had to leave out a lot of good stuff).

…

“When the data are very extensive, precision estimates calculated from simple standard statistical methods are likely to underestimate error substantially owing to the neglect of hidden correlations. A large amount of data is in no way synonymous with a large amount of information. In some settings at least, if a modest amount of poor quality data is likely to be modestly misleading, an extremely large amount of poor quality data may be extremely misleading.”

“For studies of a new phenomenon it will usually be best to examine situations in which the phenomenon is likely to appear in the most striking form, even if this is in some sense artificial or not representative. This is in line with the well-known precept in mathematical research: study the issue in the simplest possible context that is not entirely trivial, and later generalize.”

“It often […] aids the interpretation of an observational study to consider the question: what would have been done in a comparable experiment?”

“An important and perhaps sometimes underemphasized issue in empirical prediction is that of stability. Especially when repeated application of the same method is envisaged, it is unlikely that the situations to be encountered will exactly mirror those involved in setting up the method. It may well be wise to use a procedure that works well over a range of conditions even if it is sub-optimal in the data used to set up the method.”

“Many investigations have the broad form of collecting similar data repeatedly, for example on different individuals. In this connection the notion of a *unit of analysis *is often helpful in clarifying an approach to the detailed analysis. Although this notion is more generally applicable, it is clearest in the context of randomized experiments. Here the unit of analysis is that smallest subdivision of the experimental material such that two distinct units *might *be randomized (randomly allocated) to different treatments. […] In general the unit of analysis may not be the same as the unit of interpretation, that is to say, the unit about which conclusions are to drawn. The most difficult situation is when the unit of analysis is an aggregate of several units of interpretation, leading to the possibility of *ecological bias*, that is, a systematic difference between, say, the impact of explanatory variables at different levels of aggregation. […] it is important to identify the unit of analysis, which may be different in different parts of the analysis […] on the whole, limited detail is needed in examining the variation within the unit of analysis in question.”

The book briefly discusses issues pertaining to the scale of effort involved when thinking about appropriate study designs and how much/which data to gather for analysis, and notes that often associated costs are not quantified – rather a judgment call is made. An important related point is that e.g. in survey contexts response patterns will tend to depend upon the quantity of information requested; if you ask for too much, few people might reply (…and perhaps it’s also the case that it’s ‘the wrong people’ that reply? The authors don’t touch upon the potential selection bias issue, but it seems relevant). A few key observations from the book on this topic:

“the intrinsic quality of data, for example the response rates of surveys, may be degraded if too much is collected. […] sampling may give higher [data] quality than the study of a complete population of individuals. […] When researchers studied the effect of the expected length (10, 20 or 30 minutes) of a web-based questionnaire, they found that fewer potential respondents started and completed questionnaires expected to take longer (Galesic and Bosnjak, 2009). Furthermore, questions that appeared later in the questionnaire were given shorter and more uniform answers than questions that appeared near the start of the questionnaire.”

Not surprising, but certainly worth keeping in mind. Moving on…

“In general, while principal component analysis may be helpful in suggesting a base for interpretation and the formation of derived variables there is usually considerable arbitrariness involved in its use. This stems from the need to standardize the variables to comparable scales, typically by the use of correlation coefficients. This means that a variable that happens to have atypically small variability in the data will have a misleadingly depressed weight in the principal components.”

The book includes a few pages about the Berkson error model, which I’d never heard about. Wikipedia doesn’t have much about it and I was debating how much to include about this one here – I probably wouldn’t have done more than including the link here if the wikipedia article actually covered this topic in any detail, but it doesn’t. However it seemed important enough to write a few words about it. The basic difference between the ‘classical error model’, i.e. the one everybody knows about, and the Berkson error model is that in the former case the measurement error is statistically independent of the *true value* of X, whereas in the latter case the measurement error is independent of the *measured value*; the authors note that this implies that the true values are more variable than the measured values in a Berkson error context. Berkson errors can e.g. happen in experimental contexts where levels of a variable are pre-set by some target, for example in a medical context where a drug is supposed to be administered each X hours; the pre-set levels might then be the measured values, and the true values might be different e.g. if the nurse was late. I thought it important to mention this error model not only because it’s a completely new idea to me that you might encounter this sort of error-generating process, but also because there is no statistical test that you can use to figure out if the standard error model is the appropriate one, or if a Berkson error model is better; which means that you need to be aware of the difference and think about which model works best, based on the nature of the measuring process.

Let’s move on to some quotes dealing with modeling:

“while it is appealing to use methods that are in a reasonable sense fully efficient, that is, extract all relevant information in the data, nevertheless any such notion is within the framework of an assumed model. Ideally, methods should have this efficiency property while preserving good behaviour (especially stability of interpretation) when the model is perturbed. Essentially a model translates a subject-matter question into a mathematical or statistical one and, if that translation is seriously defective, the analysis will address a wrong or inappropriate question […] The greatest difficulty with quasi-realistic models [as opposed to ‘toy models’] is likely to be that they require numerical specification of features for some of which there is very little or no empirical information. Sensitivity analysis is then particularly important.”

“Parametric models typically represent some notion of smoothness; their danger is that particular representations of that smoothness may have strong and unfortunate implications. This difficulty is covered for the most part by informal checking that the primary conclusions do not depend critically on the precise form of parametric representation. To some extent such considerations can be formalized but in the last analysis some element of judgement cannot be avoided. One general consideration that is sometimes helpful is the following. If an issue can be addressed nonparametrically then it will often be better to tackle it parametrically; however, if it cannot be resolved nonparametrically then it is usually dangerous to resolve it parametrically.”

“Once a model is formulated two types of question arise. How can the unknown parameters in the model best be estimated? Is there evidence that the model needs modification or indeed should be abandoned in favour of some different representation? The second question is to be interpreted not as asking whether the model is true [*this is the wrong question to ask, as also emphasized by Burnham & Anderson*] but whether there is clear evidence of a specific kind of departure implying a need to change the model so as to avoid distortion of the final conclusions. […] it is important in applications to understand the circumstances under which different methods give similar or different conclusions. In particular, if a more elaborate method gives an apparent improvement in precision, what are the assumptions on which that improvement is based? Are they reasonable? […] the hierarchical principle implies, […] with very rare exceptions, that models with interaction terms should include also the corresponding main effects. […] When considering two families of models, it is important to consider the possibilities that both families are adequate, that one is adequate and not the other and that neither family fits the data.” [Do incidentally recall that in the context of interactions, “the term interaction […] is in some ways a misnomer. There is no necessary implication of interaction in the physical sense or synergy in a biological context. Rather, interaction means a departure from additivity […] This is expressed most explicitly by the requirement that, apart from random fluctuations, the difference in outcome between any two levels of one factor is the same at all levels of the other factor. […] The most directly interpretable form of interaction, certainly not removable by [variable] transformation, is effect reversal.”]

“The *p*-value assesses the data […] via a comparison with that anticipated if *H _{0}* were true. If in two different situations the test of a relevant null hypothesis gives approximately the same

*p*-value, it does not follow that the overall strengths of the evidence in favour of the relevant

*H*are the same in the two cases.”

_{0}“There are […] two sources of uncertainty in observational studies that are not present in randomized experiments. The first is that the ordering of the variables may be inappropriate, a particular hazard in cross-sectional studies. […] if the data are tied to one time point then any presumption of causality relies on a working hypothesis as to whether the components are explanatory or responses. Any check on this can only be from sources external to the current data. […] The second source of uncertainty is that important explanatory variables affecting both the potential cause and the outcome may not be available. […] Retrospective explanations may be convincing if based on firmly established theory but otherwise need to be treated with special caution. It is well known in many fields that ingenious explanations can be constructed retrospectively for almost any finding.”

“The general issue of applying conclusions from aggregate data to specific individuals is essentially that of showing that the individual does not belong to a subaggregate for which a substantially different conclusion applies. In actuality this can at most be indirectly checked for specific subaggregates. […] It is not unknown in the literature to see conclusions such as that there are no treatment differences except for males aged over 80 years, living more than 50 km south of Birmingham and life-long supporters of Aston Villa football club, who show a dramatic improvement under some treatment *T*. Despite the undoubted importance of this particular subgroup, virtually always such conclusions would seem to be unjustified.” [*I loved this example!*]

The authors included a few interesting results from an undated Cochrane publication which I thought I should mention. The file-drawer effect is well known, but there are a few other interesting biases at play in a publication bias context. One is time-lag bias, which means that statistically significant results take less time to get published. Another is language bias; statistically significant results are more likely to be published in English publications. A third bias is multiple publication bias; it turns out that papers with statistically significant results are more likely to be published more than once. The last one mentioned is citation bias; papers with statistically significant results are more likely to be cited in the literature.

The authors include these observations in their concluding remarks: “The overriding general principle [in the context of applied statistics], difficult to achieve, is that there should be a seamless flow between statistical and subject-matter considerations. […] in principle seamlessness requires an individual statistician to have views on subject-matter interpretation and subject-matter specialists to be interested in issues of statistical analysis.”

As already mentioned this is a good book. It’s not long, and/but it’s worth reading if you’re in the target group.

No comments yet.

## Leave a Reply