Unobserved Variables – Models and Misunderstandings
This is a neat little book in the Springer Briefs in Statistics series. The author is David J Bartholomew, a former statistics professor at the LSE. I wrote a brief goodreads review, but I thought that I might as well also add a post about the book here. The book covers topics such as the EM algorithm, Gibbs sampling, the Metropolis–Hastings algorithm and the Rasch model, and it assumes you’re familiar with stuff like how to do ML estimation, among many other things. I had some passing familiarity with many of the topics he talks about in the book, but I’m sure I’d have benefited from knowing more about some of the specific topics covered. Because large parts of the book is basically unreadable by people without a stats background I wasn’t sure how much of it it made sense to cover here, but I decided to talk a bit about a few of the things which I believe don’t require you to know a whole lot about this area.
“Modern statistics is built on the idea of models—probability models in particular. [While I was rereading this part, I was reminded of this quote which I came across while finishing my most recent quotes post: “No scientist is as model minded as is the statistician; in no other branch of science is the word model as often and consciously used as in statistics.” Hans Freudenthal.] The standard approach to any new problem is to identify the sources of variation, to describe those sources by probability distributions and then to use the model thus created to estimate, predict or test hypotheses about the undetermined parts of that model. […] A statistical model involves the identification of those elements of our problem which are subject to uncontrolled variation and a specification of that variation in terms of probability distributions. Therein lies the strength of the statistical approach and the source of many misunderstandings. Paradoxically, misunderstandings arise both from the lack of an adequate model and from over reliance on a model. […] At one level is the failure to recognise that there are many aspects of a model which cannot be tested empirically. At a higher level is the failure is to recognise that any model is, necessarily, an assumption in itself. The model is not the real world itself but a representation of that world as perceived by ourselves. This point is emphasised when, as may easily happen, two or more models make exactly the same predictions about the data. Even worse, two models may make predictions which are so close that no data we are ever likely to have can ever distinguish between them. […] All model-dependant inference is necessarily conditional on the model. This stricture needs, especially, to be borne in mind when using Bayesian methods. Such methods are totally model-dependent and thus all are vulnerable to this criticism. The problem can apparently be circumvented, of course, by embedding the model in a larger model in which any uncertainties are, themselves, expressed in probability distributions. However, in doing this we are embarking on a potentially infinite regress which quickly gets lost in a fog of uncertainty.”
“Mixtures of distributions play a fundamental role in the study of unobserved variables […] The two important questions which arise in the analysis of mixtures concern how to identify whether or not a given distribution could be a mixture and, if so, to estimate the components. […] Mixtures arise in practice because of failure to recognise that samples are drawn from several populations. If, for example, we measure the heights of men and women without distinction the overall distribution will be a mixture. It is relevant to know this because women tend to be shorter than men. […] It is often not at all obvious whether a given distribution could be a mixture […] even a two-component mixture of normals, has 5 unknown parameters. As further components are added the estimation problems become formidable. If there are many components, separation may be difficult or impossible […] [To add to the problem,] the form of the distribution is unaffected by the mixing [in the case of the mixing of normals]. Thus there is no way that we can recognise that mixing has taken place by inspecting the form of the resulting distribution alone. Any given normal distribution could have arisen naturally or be the result of normal mixing […] if f(x) is normal, there is no way of knowing whether it is the result of mixing and hence, if it is, what the mixing distribution might be.”
“Even if there is close agreement between a model and the data it does not follow that the model provides a true account of how the data arose. It may be that several models explain the data equally well. When this happens there is said to be a lack of identifiability. Failure to take full account of this fact, especially in the social sciences, has led to many over-confident claims about the nature of social reality. Lack of identifiability within a class of models may arise because different values of their parameters provide equally good fits. Or, more seriously, models with quite different characteristics may make identical predictions. […] If we start with a model we can predict, albeit uncertainly, what data it should generate. But if we are given a set of data we cannot necessarily infer that it was generated by a particular model. In some cases it may, of course, be possible to achieve identifiability by increasing the sample size but there are cases in which, no matter how large the sample size, no separation is possible. […] Identifiability matters can be considered under three headings. First there is lack of parameter identifiability which is the most common use of the term. This refers to the situation where there is more than one value of a parameter in a given model each of which gives an equally good account of the data. […] Secondly there is what we shall call lack of model identifiability which occurs when two or more models make exactly the same data predictions. […] The third type of identifiability is actually the combination of the foregoing types.
Mathematical statistics is not well-equipped to cope with situations where models are practically, but not precisely, indistinguishable because it typically deals with things which can only be expressed in unambiguously stated theorems. Of necessity, these make clear-cut distinctions which do not always correspond with practical realities. For example, there are theorems concerning such things as sufficiency and admissibility. According to such theorems, for example, a proposed statistic is either sufficient or not sufficient for some parameter. If it is sufficient it contains all the information, in a precisely defined sense, about that parameter. But in practice we may be much more interested in what we might call ‘near sufficiency’ in some more vaguely defined sense. Because we cannot give a precise mathematical definition to what we mean by this, the practical importance of the notion is easily overlooked. The same kind of fuzziness arises with what are called structural eqation models (or structural relations models) which have played a very important role in the social sciences. […] we shall argue that structural equation models are almost always unidentifiable in the broader sense of which we are speaking here. […] [our results] constitute a formidable argument against the careless use of structural relations models. […] In brief, the valid use of a structural equations model requires us to lean very heavily upon assumptions about which we may not be very sure. It is undoubtedly true that if such a model provides a good fit to the data, then it provides a possible account of how the data might have arisen. It says nothing about what other models might provide an equally good, or even better fit. As a tool of inductive inference designed to tell us something about the social world, linear structural relations modelling has very little to offer.”
“It is very common for data to be missing and this introduces a risk of bias if inferences are drawn from incomplete samples. However, we are not usually interested in the missing data themselves but in the population characteristics to whose estimation those values were intended to contribute. […] A very longstanding way of dealing with missing data is to fill in the gaps by some means or other and then carry out the standard analysis on the completed data set. This procedure is known as imputation. […] In its simplest form, each missing data point is replaced by a single value. Because there is, inevitably, uncertainty about what the imputed values should be, one can do better by substituting a range of plausible values and comparing the results in each case. This is known as multiple imputation. […] missing values may occur anywhere and in any number. They may occur haphazardly or in some pattern. In the latter case, the pattern may provide a clue to the mechanism underlying the loss of data and so suggest a method for dealing with it. The conditional distribution which we have supposed might be the basis of imputation depends, of course, on the mechanism behind the loss of data. From a practical point of view the detailed information necessary to determine this may not be readily obtainable or, even, necessary. Nevertheless, it is useful to clarify some of the issues by introducing the idea of a probability mechanism governing the loss of data. This will enable us to classify the problems which would have to be faced in a more comprehensive treatment. The simplest, if least realistic approach, is to assume that the chance of being missing is the same for all elements of the data matrix. In that case, we can, in effect, ignore the missing values […] Such situations are designated as MCAR which is an acronym for Missing Completely at Random. […] In the smoking example we have supposed that men are more likely to refuse [to answer] than women. If we go further and assume that there are no other biasing factors we are, in effect, assuming that ‘missingness’ is completely at random for men and women, separately. This would be an example of what is known as Missing at Random(MAR) […] which means that the missing mechanism depends on the observed variables but not on those that are missing. The final category is Missing Not at Random (MNAR) which is a residual category covering all other possibilities. This is difficult to deal with in practice unless one has an unusually complete knowledge of the missing mechanism.
Another term used in the theory of missing data is that of ignorability. The conditional distribution of y given x will, in general, depend on any parameters of the distribution of M [the variable we use to describe the mechanism governing the loss of observations] yet these are unlikely to be of any practical interest. It would be convenient if this distribution could be ignored for the purposes of inference about the parameters of the distribution of x. If this is the case the mechanism of loss is said to be ignorable. In practice it is acceptable to assume that the concept of ignorability is equivalent to that of MAR.”
No comments yet.