I haven’t really blogged this book in anywhere near the amount of detail it deserves even though my first post about the book actually had a few quotes illustrating how much different stuff is covered in the book.
This book is technical, and even if I’m trying to make it less technical by omitting the math in this post it may be a good idea to reread the first post about the book before reading this post to refresh your knowledge of these things.
Quotes and comments below – most of the coverage here focuses on stuff covered in chapters 3 and 4 in the book.
“Tests of null hypotheses and information-theoretic approaches should not be used together; they are very different analysis paradigms. A very common mistake seen in the applied literature is to use AIC to rank the candidate models and then “test” to see whether the best model (the alternative hypothesis) is “significantly better” than the second-best model (the null hypothesis). This procedure is flawed, and we strongly recommend against it […] the primary emphasis should be on the size of the treatment effects and their precision; too often we find a statement regarding “significance,” while the treatment and control means are not even presented. Nearly all statisticians are calling for estimates of effect size and associated precision, rather than test statistics, P-values, and “significance.” [Borenstein & Hedges certainly did as well in their book (written much later), and this was not an issue I omitted to talk about in my coverage of their book…] […] Information-theoretic criteria such as AIC, AICc, and QAICc are not a “test” in any sense, and there are no associated concepts such as test power or P-values or α-levels. Statistical hypothesis testing represents a very different, and generally inferior, paradigm for the analysis of data in complex settings. It seems best to avoid use of the word “significant” in reporting research results under an information-theoretic paradigm. […] AIC allows a ranking of models and the identification of models that are nearly equally useful versus those that are clearly poor explanations for the data at hand […]. Hypothesis testing provides no general way to rank models, even for models that are nested. […] In general, we recommend strongly against the use of null hypothesis testing in model selection.”
“The bootstrap is a type of Monte Carlo method used frequently in applied statistics. This computer-intensive approach is based on resampling of the observed data […] The fundamental idea of the model-based sampling theory approach to statistical inference is that the data arise as a sample from some conceptual probability distribution f. Uncertainties of our inferences can be measured if we can estimate f. The bootstrap method allows the computation of measures of our inference uncertainty by having a simple empirical estimate of f and sampling from this estimated distribution. In practical application, the empirical bootstrap means using some form of resampling with replacement from the actual data x to generate B (e.g., B = 1,000 or 10,000) bootstrap samples […] The set of B bootstrap samples is a proxy for a set of B independent real samples from f (in reality we have only one actual sample of data). Properties expected from replicate real samples are inferred from the bootstrap samples by analyzing each bootstrap sample exactly as we first analyzed the real data sample. From the set of results of sample size B we measure our inference uncertainties from sample to (conceptual) population […] For many applications it has been theoretically shown […] that the bootstrap can work well for large sample sizes (n), but it is not generally reliable for small n […], regardless of how many bootstrap samples B are used. […] Just as the analysis of a single data set can have many objectives, the bootstrap can be used to provide insight into a host of questions. For example, for each bootstrap sample one could compute and store the conditional variance–covariance matrix, goodness-of-fit values, the estimated variance inflation factor, the model selected, confidence interval width, and other quantities. Inference can be made concerning these quantities, based on summaries over the B bootstrap samples.”
“Information criteria attempt only to select the best model from the candidate models available; if a better model exists, but is not offered as a candidate, then the information-theoretic approach cannot be expected to identify this new model. Adjusted R2 […] are useful as a measure of the proportion of the variation “explained,” [but] are not useful in model selection […] adjusted R2 is poor in model selection; its usefulness should be restricted to description.”
“As we have struggled to understand the larger issues, it has become clear to us that inference based on only a single best model is often relatively poor for a wide variety of substantive reasons. Instead, we increasingly favor multimodel inference: procedures to allow formal statistical inference from all the models in the set. […] Such multimodel inference includes model averaging, incorporating model selection uncertainty into estimates of precision, confidence sets on models, and simple ways to assess the relative importance of variables.”
“If sample size is small, one must realize that relatively little information is probably contained in the data (unless the effect size if very substantial), and the data may provide few insights of much interest or use. Researchers routinely err by building models that are far too complex for the (often meager) data at hand. They do not realize how little structure can be reliably supported by small amounts of data that are typically “noisy.””
“Sometimes, the selected model [when applying an information criterion] contains a parameter that is constant over time, or areas, or age classes […]. This result should not imply that there is no variation in this parameter, rather that parsimony and its bias/variance tradeoff finds the actual variation in the parameter to be relatively small in relation to the information contained in the sample data. It “costs” too much in lost precision to add estimates of all of the individual θi. As the sample size increases, then at some point a model with estimates of the individual parameters would likely be favored. Just because a parsimonious model contains a parameter that is constant across strata does not mean that there is no variation in that process across the strata.”
“[In a significance testing context,] a significant test result does not relate directly to the issue of what approximating model is best to use for inference. One model selection strategy that has often been used in the past is to do likelihood ratio tests of each structural factor […] and then use a model with all the factors that were “significant” at, say, α = 0.05. However, there is no theory that would suggest that this strategy would lead to a model with good inferential properties (i.e., small bias, good precision, and achieved confidence interval coverage at the nominal level). […] The purpose of the analysis of empirical data is not to find the “true model”— not at all. Instead, we wish to find a best approximating model, based on the data, and then develop statistical inferences from this model. […] We search […] not for a “true model,” but rather for a parsimonious model giving an accurate approximation to the interpretable information in the data at hand. Data analysis involves the question, “What level of model complexity will the data support?” and both under- and overfitting are to be avoided. Larger data sets tend to support more complex models, and the selection of the size of the model represents a tradeoff between bias and variance.”
“The easy part of the information-theoretic approaches includes both the computational aspects and the clear understanding of these results […]. The hard part, and the one where training has been so poor, is the a priori thinking about the science of the matter before data analysis — even before data collection. It has been too easy to collect data on a large number of variables in the hope that a fast computer and sophisticated software will sort out the important things — the “significant” ones […]. Instead, a major effort should be mounted to understand the nature of the problem by critical examination of the literature, talking with others working on the general problem, and thinking deeply about alternative hypotheses. Rather than “test” dozens of trivial matters (is the correlation zero? is the effect of the lead treatment zero? are ravens pink?, Anderson et al. 2000), there must be a more concerted effort to provide evidence on meaningful questions that are important to a discipline. This is the critical point: the common failure to address important science questions in a fully competent fashion. […] “Let the computer find out” is a poor strategy for researchers who do not bother to think clearly about the problem of interest and its scientific setting. The sterile analysis of “just the numbers” will continue to be a poor strategy for progress in the sciences.
Researchers often resort to using a computer program that will examine all possible models and variables automatically. Here, the hope is that the computer will discover the important variables and relationships […] The primary mistake here is a common one: the failure to posit a small set of a priori models, each representing a plausible research hypothesis.”
“Model selection is most often thought of as a way to select just the best model, then inference is conditional on that model. However, information-theoretic approaches are more general than this simplistic concept of model selection. Given a set of models, specified independently of the sample data, we can make formal inferences based on the entire set of models. […] Part of multimodel inference includes ranking the fitted models from best to worst […] and then scaling to obtain the relative plausibility of each fitted model (gi) by a weight of evidence (wi) relative to the selected best model. Using the conditional sampling variance […] from each model and the Akaike weights […], unconditional inferences about precision can be made over the entire set of models. Model-averaged parameter estimates and estimates of unconditional sampling variances can be easily computed. Model selection uncertainty is a substantial subject in its own right, well beyond just the issue of determining the best model.”
“There are three general approaches to assessing model selection uncertainty: (1) theoretical studies, mostly using Monte Carlo simulation methods; (2) the bootstrap applied to a given set of data; and (3) utilizing the set of AIC differences (i.e., ∆i) and model weights wi from the set of models fit to data.”
“Statistical science should emphasize estimation of parameters and associated measures of estimator uncertainty. Given a correct model […], an MLE is reliable, and we can compute a reliable estimate of its sampling variance and a reliable confidence interval […]. If the model is selected entirely independently of the data at hand, and is a good approximating model, and if n is large, then the estimated sampling variance is essentially unbiased, and any appropriate confidence interval will essentially achieve its nominal coverage. This would be the case if we used only one model, decided on a priori, and it was a good model, g, of the data generated under truth, f. However, even when we do objective, data-based model selection (which we are advocating here), the [model] selection process is expected to introduce an added component of sampling uncertainty into any estimated parameter; hence classical theoretical sampling variances are too small: They are conditional on the model and do not reflect model selection uncertainty. One result is that conditional confidence intervals can be expected to have less than nominal coverage.”
“Data analysis is sometimes focused on the variables to include versus exclude in the selected model (e.g., important vs. unimportant). Variable selection is often the focus of model selection for linear or logistic regression models. Often, an investigator uses stepwise analysis to arrive at a final model, and from this a conclusion is drawn that the variables in this model are important, whereas the other variables are not important. While common, this is poor practice and, among other issues, fails to fully consider model selection uncertainty. […] Estimates of the relative importance of predictor variables xj can best be made by summing the Akaike weights across all the models in the set where variable j occurs. Thus, the relative importance of variable j is reflected in the sum w+ (j). The larger the w+ (j) the more important variable j is, relative to the other variables. Using the w+ (j), all the variables can be ranked in their importance. […] This idea extends to subsets of variables. For example, we can judge the importance of a pair of variables, as a pair, by the sum of the Akaike weights of all models that include the pair of variables. […] To summarize, in many contexts the AIC selected best model will include some variables and exclude others. Yet this inclusion or exclusion by itself does not distinguish differential evidence for the importance of a variable in the model. The model weights […] summed over all models that include a given variable provide a better weight of evidence for the importance of that variable in the context of the set of models considered.” [The reason why I’m not telling you how to calculate Akaike weights is that I don’t want to bother with math formulas in wordpress – but I guess all you need to know is that these are not hard to calculate. It should perhaps be added that one can also use bootstrapping methods to obtain relevant model weights to apply in a multimodel inference context.]
“If data analysis relies on model selection, then inferences should acknowledge model selection uncertainty. If the goal is to get the best estimates of a set of parameters in common to all models (this includes prediction), model averaging is recommended. If the models have definite, and differing, interpretations as regards understanding relationships among variables, and it is such understanding that is sought, then one wants to identify the best model and make inferences based on that model. […] The bootstrap provides direct, robust estimates of model selection probabilities πi , but we have no reason now to think that use of bootstrap estimates of model selection probabilities rather than use of the Akaike weights will lead to superior unconditional sampling variances or model-averaged parameter estimators. […] Be mindful of possible model redundancy. A carefully thought-out set of a priori models should eliminate model redundancy problems and is a central part of a sound strategy for obtaining reliable inferences. […] Results are sensitive to having demonstrably poor models in the set of models considered; thus it is very important to exclude models that are a priori poor. […] The importance of a small number (R) of candidate models, defined prior to detailed analysis of the data, cannot be overstated. […] One should have R much smaller than n. MMI [Multi-Model Inference] approaches become increasingly important in cases where there are many models to consider.”
“In general there is a substantial amount of model selection uncertainty in many practical problems […]. Such uncertainty about what model structure (and associated parameter values) is the K-L [Kullback–Leibler] best approximating model applies whether one uses hypothesis testing, information-theoretic criteria, dimension-consistent criteria, cross-validation, or various Bayesian methods. Often, there is a nonnegligible variance component for estimated parameters (this includes prediction) due to uncertainty about what model to use, and this component should be included in estimates of precision. […] we recommend assessing model selection uncertainty rather than ignoring the matter. […] It is […] not a sound idea to pick a single model and unquestioningly base extrapolated predictions on it when there is model uncertainty.”
This is a neat little book in the Springer Briefs in Statistics series. The author is David J Bartholomew, a former statistics professor at the LSE. I wrote a brief goodreads review, but I thought that I might as well also add a post about the book here. The book covers topics such as the EM algorithm, Gibbs sampling, the Metropolis–Hastings algorithm and the Rasch model, and it assumes you’re familiar with stuff like how to do ML estimation, among many other things. I had some passing familiarity with many of the topics he talks about in the book, but I’m sure I’d have benefited from knowing more about some of the specific topics covered. Because large parts of the book is basically unreadable by people without a stats background I wasn’t sure how much of it it made sense to cover here, but I decided to talk a bit about a few of the things which I believe don’t require you to know a whole lot about this area.
“Modern statistics is built on the idea of models—probability models in particular. [While I was rereading this part, I was reminded of this quote which I came across while finishing my most recent quotes post: “No scientist is as model minded as is the statistician; in no other branch of science is the word model as often and consciously used as in statistics.” Hans Freudenthal.] The standard approach to any new problem is to identify the sources of variation, to describe those sources by probability distributions and then to use the model thus created to estimate, predict or test hypotheses about the undetermined parts of that model. […] A statistical model involves the identification of those elements of our problem which are subject to uncontrolled variation and a specification of that variation in terms of probability distributions. Therein lies the strength of the statistical approach and the source of many misunderstandings. Paradoxically, misunderstandings arise both from the lack of an adequate model and from over reliance on a model. […] At one level is the failure to recognise that there are many aspects of a model which cannot be tested empirically. At a higher level is the failure is to recognise that any model is, necessarily, an assumption in itself. The model is not the real world itself but a representation of that world as perceived by ourselves. This point is emphasised when, as may easily happen, two or more models make exactly the same predictions about the data. Even worse, two models may make predictions which are so close that no data we are ever likely to have can ever distinguish between them. […] All model-dependant inference is necessarily conditional on the model. This stricture needs, especially, to be borne in mind when using Bayesian methods. Such methods are totally model-dependent and thus all are vulnerable to this criticism. The problem can apparently be circumvented, of course, by embedding the model in a larger model in which any uncertainties are, themselves, expressed in probability distributions. However, in doing this we are embarking on a potentially infinite regress which quickly gets lost in a fog of uncertainty.”
“Mixtures of distributions play a fundamental role in the study of unobserved variables […] The two important questions which arise in the analysis of mixtures concern how to identify whether or not a given distribution could be a mixture and, if so, to estimate the components. […] Mixtures arise in practice because of failure to recognise that samples are drawn from several populations. If, for example, we measure the heights of men and women without distinction the overall distribution will be a mixture. It is relevant to know this because women tend to be shorter than men. […] It is often not at all obvious whether a given distribution could be a mixture […] even a two-component mixture of normals, has 5 unknown parameters. As further components are added the estimation problems become formidable. If there are many components, separation may be difficult or impossible […] [To add to the problem,] the form of the distribution is unaffected by the mixing [in the case of the mixing of normals]. Thus there is no way that we can recognise that mixing has taken place by inspecting the form of the resulting distribution alone. Any given normal distribution could have arisen naturally or be the result of normal mixing […] if f(x) is normal, there is no way of knowing whether it is the result of mixing and hence, if it is, what the mixing distribution might be.”
“Even if there is close agreement between a model and the data it does not follow that the model provides a true account of how the data arose. It may be that several models explain the data equally well. When this happens there is said to be a lack of identifiability. Failure to take full account of this fact, especially in the social sciences, has led to many over-confident claims about the nature of social reality. Lack of identifiability within a class of models may arise because different values of their parameters provide equally good fits. Or, more seriously, models with quite different characteristics may make identical predictions. […] If we start with a model we can predict, albeit uncertainly, what data it should generate. But if we are given a set of data we cannot necessarily infer that it was generated by a particular model. In some cases it may, of course, be possible to achieve identifiability by increasing the sample size but there are cases in which, no matter how large the sample size, no separation is possible. […] Identifiability matters can be considered under three headings. First there is lack of parameter identifiability which is the most common use of the term. This refers to the situation where there is more than one value of a parameter in a given model each of which gives an equally good account of the data. […] Secondly there is what we shall call lack of model identifiability which occurs when two or more models make exactly the same data predictions. […] The third type of identifiability is actually the combination of the foregoing types.
Mathematical statistics is not well-equipped to cope with situations where models are practically, but not precisely, indistinguishable because it typically deals with things which can only be expressed in unambiguously stated theorems. Of necessity, these make clear-cut distinctions which do not always correspond with practical realities. For example, there are theorems concerning such things as sufficiency and admissibility. According to such theorems, for example, a proposed statistic is either sufficient or not sufficient for some parameter. If it is sufficient it contains all the information, in a precisely defined sense, about that parameter. But in practice we may be much more interested in what we might call ‘near sufficiency’ in some more vaguely defined sense. Because we cannot give a precise mathematical definition to what we mean by this, the practical importance of the notion is easily overlooked. The same kind of fuzziness arises with what are called structural eqation models (or structural relations models) which have played a very important role in the social sciences. […] we shall argue that structural equation models are almost always unidentifiable in the broader sense of which we are speaking here. […] [our results] constitute a formidable argument against the careless use of structural relations models. […] In brief, the valid use of a structural equations model requires us to lean very heavily upon assumptions about which we may not be very sure. It is undoubtedly true that if such a model provides a good fit to the data, then it provides a possible account of how the data might have arisen. It says nothing about what other models might provide an equally good, or even better fit. As a tool of inductive inference designed to tell us something about the social world, linear structural relations modelling has very little to offer.”
“It is very common for data to be missing and this introduces a risk of bias if inferences are drawn from incomplete samples. However, we are not usually interested in the missing data themselves but in the population characteristics to whose estimation those values were intended to contribute. […] A very longstanding way of dealing with missing data is to fill in the gaps by some means or other and then carry out the standard analysis on the completed data set. This procedure is known as imputation. […] In its simplest form, each missing data point is replaced by a single value. Because there is, inevitably, uncertainty about what the imputed values should be, one can do better by substituting a range of plausible values and comparing the results in each case. This is known as multiple imputation. […] missing values may occur anywhere and in any number. They may occur haphazardly or in some pattern. In the latter case, the pattern may provide a clue to the mechanism underlying the loss of data and so suggest a method for dealing with it. The conditional distribution which we have supposed might be the basis of imputation depends, of course, on the mechanism behind the loss of data. From a practical point of view the detailed information necessary to determine this may not be readily obtainable or, even, necessary. Nevertheless, it is useful to clarify some of the issues by introducing the idea of a probability mechanism governing the loss of data. This will enable us to classify the problems which would have to be faced in a more comprehensive treatment. The simplest, if least realistic approach, is to assume that the chance of being missing is the same for all elements of the data matrix. In that case, we can, in effect, ignore the missing values […] Such situations are designated as MCAR which is an acronym for Missing Completely at Random. […] In the smoking example we have supposed that men are more likely to refuse [to answer] than women. If we go further and assume that there are no other biasing factors we are, in effect, assuming that ‘missingness’ is completely at random for men and women, separately. This would be an example of what is known as Missing at Random(MAR) […] which means that the missing mechanism depends on the observed variables but not on those that are missing. The final category is Missing Not at Random (MNAR) which is a residual category covering all other possibilities. This is difficult to deal with in practice unless one has an unusually complete knowledge of the missing mechanism.
Another term used in the theory of missing data is that of ignorability. The conditional distribution of y given x will, in general, depend on any parameters of the distribution of M [the variable we use to describe the mechanism governing the loss of observations] yet these are unlikely to be of any practical interest. It would be convenient if this distribution could be ignored for the purposes of inference about the parameters of the distribution of x. If this is the case the mechanism of loss is said to be ignorable. In practice it is acceptable to assume that the concept of ignorability is equivalent to that of MAR.”
(No, not that type of modelling! – I was rather thinking about the type below…)
Anyway, I assume not all readers are equally familiar with this stuff, which I’ve incidentally written about before e.g. here. Some of you will know all this stuff already and you do not need to read on (well, maybe you do – in order to realize that you do not..). Some of it is recap, some of it I don’t think I’ve written about before. Anyway.
i. So, a model is a representation of the world. It’s a simplified version of it, which helps us think about the matters at hand.
ii. Models always have a lot of assumptions. A perhaps surprising observation is that, from a certain point of view, models which might be categorized as more ‘simple’ (few explicit assumptions) can be said to make as many assumptions as do more ‘complex’ models (many explicit assumptions); it’s just that the underlying assumptions are different. To illustate this, let’s have a look at two different models, model 1 and model 2. Model 1 is a model which states that ‘Y = aX’. Model 2 is a model which states that ‘Y = aX + bZ’.
Model 1 assumes b is equal to 0 so that Z is not a relevant variable to include, whereas model 2 assumes b is not zero – but both models make assumptions about this variable ‘Z’ (and the parameter ‘b’). Models will often differ along such lines, making different assumptions about variables and how they interact (incidentally here we’re implicitly assuming in both models that X and Z are independent). A ‘simple’ model does make fewer (explicit) assumptions about the world than does a ‘complex’ model – but that question is different from the question of which restrictions the two models impose on the data. And thinking in binary terms when we ask ourselves the question, ‘Are we making an assumption about this variable or this relationship?’, then the answer will always be ‘yes’ either way. Does the variable Z contribute information relevant to Y? Does it interact with other variables in the model? Both the simple model and the complex model include assumptions about this stuff. At every branching point where the complex model departs from the simple one, you have one assumption in one model (‘the distinction between f and g matters’, ‘alpha is non-zero’) and another assumption in the other (‘the distinction between f and g doesn’t matter’, ‘alpha is zero’). You always make assumptions, it’s just that the assumptions are different. In simple models assumptions are often not spelled out, which is presumably part of why some of the assumptions made in such models are easy to overlook; it makes sense that they’re not, incidentally, because there’s an infinite number of ways to make adjustments to a model. It’s true that branching out does take place in some complex models in ways that do not occur in simple models, and once you’re more than one branching point away from the departure point where the two models first differ then the behaviour of the complex model may start to be determined by additional new assumptions where on the other hand the behaviour of the simple model might still rely on the same assumption that determined the behaviour at the first departure point – so the number of explicit assumptions will be different, but an assumption is made in either case at every junction.
As might be inferred from the comments above usually ‘the simple model’ will be the one with the more restrictive assumptions, in terms of what the data is ‘allowed’ to do. Fewer assumptions usually means stronger assumptions. It’s a much stronger assumption to assume that e.g. males and females are identical than is the alternative that they are not; there are many ways they could be not identical but only one way in which they can be. The restrictiveness of a model does not equal the number of assumptions (explicitly) made. No, on a general note it is rather the case that more assumptions mean that your model becomes less restrictive, because additional assumptions allow for more stuff to vary – this is indeed a big part of why model-builders generally don’t just stick to very simple models; if you do that, you don’t get the details right. Adding more assumptions may allow you to make a more correct model that better explains the data. It is my experience (not that I have much of it, but..) that people who’re unfamiliar with modelling think of additional assumptions as somehow ‘problematic’ – ‘more stuff can go wrong if you add more assumptions, the more assumptions you have the more likely it is that one of them is violated’. The problem is that not making assumptions is not really an option; you’ll basically assume something no matter what you do. ‘That variable/distinction/connection is irrelevant’, which is often the default assumption, is also just that – an assumption. If you do modelling you don’t ever get to not make assumptions, they’re always there lurking in the background whether you like it or not.
iii. A big problem is that we don’t know a priori which assumptions are correct before we’ve actually tested the models – indeed, we often make models mainly in order to figure out which assumptions are correct. (Sometimes we can’t even test the assumptions we’re making in a model, but let’s ignore this problem here…). A more complex model may not always be more correct, perform better. Sometimes it’ll actually do a worse job at explaining the variation in the data than a simple one would have done. When you add more variables to a model, you also add more uncertainty because of things like measurement error. Sometimes it’s worth it, because the new variable explain a lot of the variation in the data. Sometimes it’s not – sometimes the noise you add is far more relevant than is the additional information contribution about how the data behaves.
There are various ways to try to figure out if the amount of noise added from an additional variable is too high for it to be a good idea to include the variable in a model, but they’re not perfect and you always have tradeoffs. There are many different methods to estimate which model performs better, and the different methods apply different criteria – so you can easily get into a situation where the choice of which variable to include in your ‘best model’ depends on e.g. which information criterium you choose to apply.
Anyway the key point is this: You can’t just add everything (all possible variables you could imagine play a role) and assume you’ll be able to explain everything that way – adding another variable may indeed sometimes be a very bad idea.
iv. If you test a lot of hypotheses simultaneously, which all have some positive probability of being evaluated as correct, then as you add more variables to your model it becomes more and more likely that at least one of those hypotheses will be evaluated as being correct (relevant link) unless you somehow adjust the probability of a given hypothesis being evaluated as correct as you add more hypotheses along the way. This is another reason adding more variables to a model can sometimes be problematic. There are ways around this particular problem, but if they are not used, which they often are not, then you need to be careful.
v. Adding more variables is not always preferable, but then what about throwing more data at the problem by adding to the sample size? Surely if you add more data to the sample that should increase your confidence in the model results, right? Well… No – bigger is actually not always better. This is related to the concept of consistency in statistics. “A consistent estimator is one for which, when the estimate is considered as a random variable indexed by the number n of items in the data set, as n increases the estimates converge to the value that the estimator is designed to estimate,” as the wiki article puts it. You can imagine that consistency is one of the key assumptions underlying statistical models – it really is, we care a lot about consistency, and all else equal you should always prefer a consistent estimator to an inconsistent one (however it should be noted that all else is not always equal; a consistent estimator may have larger variance than an inconsistent estimator in a finite sample, which means that we may actually sometimes prefer the latter to the former in specific situations). But the thing is, not all estimators are consistent. There are always some critical assumptions which need to be satisfied in order for the consistency requirement to be met, and in a bad model these requirements will not be met. If you have a bad model, for example if you’ve incorrectly specified the relationships between the variables or included the wrong variables in your model, then increasing the sample size will do nothing to help you – additional data will not somehow magically make the estimates more reliable ‘because of asymptotics’. In fact if your model’s performance is very sensitive to the sample size to which you apply it, it may well indicate that there’s a problem with the model, i.e. that the model is misspecified (see e.g. this).
vi. Not all model assumptions are equal – some assumptions will usually be much more critical than others. As already mentioned consistency of regressors is very important, and here it is important to note that not all model assumption violations will lead to inconsistent estimators. An example of where this is not the case is the homoskedasticity assumption (see also this) in regression analysis. Here you can actually find yourself in a situation where you deliberately apply a model where you know that one of your assumptions about how the data behaves is violated, yet this is not a problem at all because you can deal with the problem separately so that that violation is of no practical importance as you can correct for it. As already mentioned in the beginning most models will be simplified versions of the stuff that goes on in the real world, so you’ll expect to see some ‘violations’ here and there – the key question to ask here is then, is the violation important and which consequences does it have for the estimates we’ve obtained? If you do not ask yourself such questions when evaluating a model, you may easily end up quibbling about details which are of no importance anyway because they don’t really matter. And remember that all the assumptions made in the model are not always spelled out, and that some of the important ones may have been overlooked.
vii. Which causal inferences to make from the model? Correlation != causation. To some extent the question to which extent the statistical link is causal relates to questions pertaining to whether we’ve picked the right variables and the right way to relate them to each other. But as I’ve remarked upon before some model types are better suited for establishing causal links than are others – there are good ways and bad ways to get at the heart of the matter (one application here, I believe I’ve linked to this before). Different fields will often have developed different approaches, see e.g. this, this and this. Correlation on its own will probably tell you next to nothing about anything you might be interested in; as I believe my stats prof put it last semester, ‘we don’t care about correlation, correlation means nothing’. Randomization schemes with treatment groups and control groups are great. If we can’t do those, we can still try to make models to get around the problems. Those models make assumptions, but so do the other models you’re comparing them with and in order to properly evaluate them you need to be explicit about the assumptions made by the competing models as well.
It takes way more time to cover this stuff in detail here than I’m willing to spend on it, but here are a few relevant links to stuff I’m working on/with at the moment:
iii. Kolmogorov–Smirnov test.
iv. Chow test.
vi. Education and health: Evaluating Theories and Evidence, by Cutler & Muney.
vii. Education, Health and Mortality: Evidence from a Social Experiment, by Meghir, Palme & Simeonova.
i. Econometric methods for causal evaluation of education policies and practices: a non-technical guide. This one is ‘work-related'; in one of my courses I’m writing a paper and this working paper is one (of many) of the sources I’m planning on using. Most of the papers I work with are unfortunately not freely available online, which is part of why I haven’t linked to them here on the blog.
I should note that there are no equations in this paper, so you should focus on the words ‘a non-technical guide’ rather than the words ‘econometric methods’ in the title – I think this is a very readable paper for the non-expert as well. I should of course also note that I have worked with most of these methods in a lot more detail, and that without the math it’s very hard to understand the details and really know what’s going on e.g. when applying such methods – or related methods such as IV methods on panel data, a topic which was covered in another class just a few weeks ago but which is not covered in this paper.
This is a place to start if you want to know something about applied econometric methods, particularly if you want to know how they’re used in the field of educational economics, and especially if you don’t have a strong background in stats or math. It should be noted that some of the methods covered see wide-spread use in other areas of economics as well; IV is widely used, and the difference-in-differences estimator have seen a lot of applications in health economics.
ii. Regulating the Way to Obesity: Unintended Consequences of Limiting Sugary Drink Sizes. The law of unintended consequences strikes again.
You could argue with some of the assumptions made here (e.g. that prices (/oz) remain constant) but I’m not sure the findings are that sensitive to that assumption, and without an explicit model of the pricing mechanism at work it’s mostly guesswork anyway.
iii. A discussion about the neurobiology of memory. Razib Khan posted a short part of the video recently, so I decided to watch it today. A few relevant wikipedia links: Memory, Dead reckoning, Hebbian theory, Caenorhabditis elegans. I’m skeptical, but I agree with one commenter who put it this way: “I know darn well I’m too ignorant to decide whether Randy is possibly right, or almost certainly wrong — yet I found this interesting all the way through.” I also agree with another commenter who mentioned that it’d have been useful for Gallistel to go into details about the differences between short term and long term memory and how these differences relate to the problem at hand.
“An extensive body of prior research indicates an association between emotion and moral judgment. In the present study, we characterized the predictive power of specific aspects of emotional processing (e.g., empathic concern versus personal distress) for different kinds of moral responders (e.g., utilitarian versus non-utilitarian). Across three large independent participant samples, using three distinct pairs of moral scenarios, we observed a highly specific and consistent pattern of effects. First, moral judgment was uniquely associated with a measure of empathy but unrelated to any of the demographic or cultural variables tested, including age, gender, education, as well as differences in “moral knowledge” and religiosity. Second, within the complex domain of empathy, utilitarian judgment was consistently predicted only by empathic concern, an emotional component of empathic responding. In particular, participants who consistently delivered utilitarian responses for both personal and impersonal dilemmas showed significantly reduced empathic concern, relative to participants who delivered non-utilitarian responses for one or both dilemmas. By contrast, participants who consistently delivered non-utilitarian responses on both dilemmas did not score especially high on empathic concern or any other aspect of empathic responding.”
In case you were wondering, the difference hasn’t got anything to do with a difference in the ability to ‘see things from the other guy’s point of view': “the current study demonstrates that utilitarian responders may be as capable at perspective taking as non-utilitarian responders. As such, utilitarian moral judgment appears to be specifically associated with a diminished affective reactivity to the emotions of others (empathic concern) that is independent of one’s ability for perspective taking”.
On a small sidenote, I’m not really sure I get the authors at all – one of the questions they ask in the paper’s last part is whether ‘utilitarians are simply antisocial?’ This is such a stupid way to frame this I don’t even know how to begin to respond; I mean, utilitarians make better decisions that save more lives, and that’s consistent with them being antisocial? I should think the ‘social’ thing to do would be to save as many lives as possible. Dead people aren’t very social, and when your actions cause more people to die they also decrease the scope for future social interaction.
v. Lastly, some Khan Academy videos:
(This one may be very hard to understand if you haven’t covered this stuff before, but I figured I might as well post it here. If you don’t know e.g. what myosin and actin is you probably won’t get much out of this video. If you don’t watch it, this part of what’s covered is probably the most important part to take away from it.)
It’s been a long time since I checked out the Brit Cruise information theory playlist, and I was happy to learn that he’s updated it and added some more stuff. I like the way he combines historical stuff with a ‘how does it actually work, and how did people realize that’s how it works’ approach – learning how people figured out stuff is to me sometimes just as fascinating as learning what they figured out:
(Relevant wikipedia links: Leyden jar, Electrostatic generator, Semaphore line. Cruise’ play with the cat and the amber may look funny, but there’s a point to it: “The Greek word for amber is ηλεκτρον (“elektron”) and is the origin of the word “electricity”.” – from the first link).
i. Aedes Albopictus.
“The Tiger mosquito or forest day mosquito, Aedes albopictus (Stegomyia albopicta), from the mosquito (Culicidae) family, is characterized by its black and white striped legs, and small black and white striped body. It is native to the tropical and subtropical areas of Southeast Asia; however, in the past couple of decades this species has invaded many countries throughout the world through the transport of goods and increasing international travel. This mosquito has become a significant pest in many communities because it closely associates with humans (rather than living in wetlands), and typically flies and feeds in the daytime in addition to at dusk and dawn. The insect is called a tiger mosquito because its striped appearance is similar to a tiger. Aedes albopictus is an epidemiologically important vector for the transmission of many viral pathogens, including the West Nile virus, Yellow fever virus, St. Louis encephalitis, dengue fever, and Chikungunya fever, as well as several filarial nematodes such as Dirofilaria immitis. […]
Aedes albopictus also bites other mammals besides humans and they also bite birds. They are always on the search for a host and are both persistent and cautious when it comes to their blood meal and host location. Their blood meal is often broken off short without enough blood ingested for the development of their eggs. This is why Asian tiger mosquitoes bite multiple hosts during their development cycle of the egg, making them particularly efficient at transmitting diseases. The mannerism of biting diverse host species enables the Asian tiger mosquito to be a potential bridge vector for certain pathogens, for example, the West Nile virus that can jump species boundaries. […]
The Asian tiger mosquito originally came from Southeast Asia. In 1966, parts of Asia and the island worlds of India and the Pacific Ocean were denoted as the area of circulation for the Asian tiger mosquito. Since then, it has spread to Europe, the Americas, the Caribbean, Africa and the Middle East. Aedes albopictus is one of the 100 world’s worst invasive species according to the Global Invasive Species Database. […]
In Europe, the Asian tiger mosquito apparently covers an extensive new niche. This means that there are no native, long-established species that conflict with the dispersal of Aedes albopictus. […]
The Asian tiger mosquito was responsible for the Chikungunya epidemic on the French Island La Réunion in 2005–2006. By September 2006, there were an estimated 266,000 people infected with the virus, and 248 fatalities on the island. The Asian tiger mosquito was also the transmitter of the virus in the first and only outbreak of Chikungunya fever on the European continent. […]
Aedes albopictus has proven to be very difficult to suppress or to control due to their remarkable ability to adapt to various environments, their close contact with humans, and their reproductive biology.”
In case you were wondering, the word Aedes comes from the Greek word for “unpleasant”. So, yeah…
ii. Orbital resonance.
“In celestial mechanics, an orbital resonance occurs when two orbiting bodies exert a regular, periodic gravitational influence on each other, usually due to their orbital periods being related by a ratio of two small integers. The physics principle behind orbital resonance is similar in concept to pushing a child on a swing, where the orbit and the swing both have a natural frequency, and the other body doing the “pushing” will act in periodic repetition to have a cumulative effect on the motion. Orbital resonances greatly enhance the mutual gravitational influence of the bodies, i.e., their ability to alter or constrain each other’s orbits. In most cases, this results in an unstable interaction, in which the bodies exchange momentum and shift orbits until the resonance no longer exists. Under some circumstances, a resonant system can be stable and self-correcting, so that the bodies remain in resonance. Examples are the 1:2:4 resonance of Jupiter‘s moons Ganymede, Europa and Io, and the 2:3 resonance between Pluto and Neptune. Unstable resonances with Saturn‘s inner moons give rise to gaps in the rings of Saturn. The special case of 1:1 resonance (between bodies with similar orbital radii) causes large Solar System bodies to eject most other bodies sharing their orbits; this is part of the much more extensive process of clearing the neighbourhood, an effect that is used in the current definition of a planet.”
iii. Some ‘work-blog related links': Local regression, Quasi-experiment, Nonparametric regression, Regression discontinuity design, Kaplan–Meier estimator, Law of total expectation, Slutsky’s theorem, Difference in differences, Panel analysis.
v. Hill sphere.
“An astronomical body‘s Hill sphere is the region in which it dominates the attraction of satellites. To be retained by a planet, a moon must have an orbit that lies within the planet’s Hill sphere. That moon would, in turn, have a Hill sphere of its own. Any object within that distance would tend to become a satellite of the moon, rather than of the planet itself.
In more precise terms, the Hill sphere approximates the gravitational sphere of influence of a smaller body in the face of perturbations from a more massive body. It was defined by the American astronomer George William Hill, based upon the work of the French astronomer Édouard Roche. For this reason, it is also known as the Roche sphere (not to be confused with the Roche limit). The Hill sphere extends between the Lagrangian points L1 and L2, which lie along the line of centers of the two bodies. The region of influence of the second body is shortest in that direction, and so it acts as the limiting factor for the size of the Hill sphere. Beyond that distance, a third object in orbit around the second (e.g. Jupiter) would spend at least part of its orbit outside the Hill sphere, and would be progressively perturbed by the tidal forces of the central body (e.g. the Sun), eventually ending up orbiting the latter. […]
The Hill sphere is only an approximation, and other forces (such as radiation pressure or the Yarkovsky effect) can eventually perturb an object out of the sphere. This third object should also be of small enough mass that it introduces no additional complications through its own gravity. Detailed numerical calculations show that orbits at or just within the Hill sphere are not stable in the long term; it appears that stable satellite orbits exist only inside 1/2 to 1/3 of the Hill radius.”
I found myself looking up quite a few other astronomy-related articles when I was reading Formation and Evolution of Exoplanets (technically the link is to the 2010 version whereas I was reading the 2008 version, but it doesn’t look as if a whole lot of stuff’s been changed and I can’t find a link to the 2008 version). I haven’t mentioned the book here because I basically gave up reading it midway into the second chapter. The book didn’t try to hide that I probably wasn’t in the intended target group but I decided to give it a try anyway: “This book is intended to suit a readership with a wide range of previous knowledge of planetary science, astrophysics, and scientific programming. Expertise in these fields should not be required to grasp the key concepts presented in the forthcoming chapters, although a reasonable grasp of basic physics is probably essential.” I figured I could grasp the key concepts even though I’d lose out on a lot of details, but the math started getting ugly quite fast, and as I have plenty of ugly math to avoid as it is I decided to give the book a miss (though I did read the first 50 pages or so).
vi. Grover Cleveland (featured).
“Stephen Grover Cleveland (March 18, 1837 – June 24, 1908) was the 22nd and 24th President of the United States. Cleveland is the only president to serve two non-consecutive terms (1885–1889 and 1893–1897) and therefore is the only individual to be counted twice in the numbering of the presidents. He was the winner of the popular vote for president three times—in 1884, 1888, and 1892—and was the only Democrat elected to the presidency in the era of Republican political domination that lasted from 1861 to 1913.
Cleveland was the leader of the pro-business Bourbon Democrats who opposed high tariffs, Free Silver, inflation, imperialism and subsidies to business, farmers or veterans. His battles for political reform and fiscal conservatism made him an icon for American conservatives of the era. Cleveland won praise for his honesty, independence, integrity, and commitment to the principles of classical liberalism. Cleveland relentlessly fought political corruption, patronage, and bossism. Indeed, as a reformer his prestige was so strong that the reform wing of the Republican Party, called “Mugwumps“, largely bolted the GOP ticket and swung to his support in 1884. […]
Cleveland took strong positions and was heavily criticized. His intervention in the Pullman Strike of 1894 to keep the railroads moving angered labor unions nationwide and angered the party in Illinois; his support of the gold standard and opposition to Free Silver alienated the agrarian wing of the Democratic Party. Furthermore, critics complained that he had little imagination and seemed overwhelmed by the nation’s economic disasters—depressions and strikes—in his second term. Even so, his reputation for honesty and good character survived the troubles of his second term. […]
Cleveland’s term as mayor was spent fighting the entrenched interests of the party machines. Among the acts that established his reputation was a veto of the street-cleaning bill passed by the Common Council. The street-cleaning contract was open for bids, and the Council selected the highest bidder, rather than the lowest, because of the political connections of the bidder. While this sort of bipartisan graft had previously been tolerated in Buffalo, Mayor Cleveland would have none of it, and replied with a stinging veto message: “I regard it as the culmination of a most bare-faced, impudent, and shameless scheme to betray the interests of the people, and to worse than squander the public money”. The Council reversed themselves and awarded the contract to the lowest bidder. For this, and several other acts to safeguard the public funds, Cleveland’s reputation as an honest politician began to spread beyond Erie County. […] [As a president…] Cleveland used the veto far more often than any president up to that time. […]
In a 1905 article in The Ladies Home Journal, Cleveland weighed in on the women’s suffrage movement, writing that “sensible and responsible women do not want to vote. The relative positions to be assumed by men and women in the working out of our civilization were assigned long ago by a higher intelligence.””
Here’s how his second cabinet looked like – this was how a presidential cabinet looked like 120 years ago (as always you can click the image to see it in a higher resolution – and just in case you were in doubt: Cleveland is the old white man in the picture…):
vii. Boeing B-52 Stratofortress (‘good article’).
“The Boeing B-52 Stratofortress is a long-range, subsonic, jet-powered strategic bomber. The B-52 was designed and built by Boeing, which has continued to provide support and upgrades. It has been operated by the United States Air Force (USAF) since the 1950s. The bomber carries up to 70,000 pounds (32,000 kg) of weapons.
Beginning with the successful contract bid in June 1946, the B-52 design evolved from a straight-wing aircraft powered by six turboprop engines to the final prototype YB-52 with eight turbojet engines and swept wings. The B-52 took its maiden flight in April 1952. Built to carry nuclear weapons for Cold War-era deterrence missions, the B-52 Stratofortress replaced the Convair B-36. Although a veteran of several wars, the Stratofortress has dropped only conventional munitions in combat. Its Stratofortress name is rarely used outside of official contexts; it has been referred to by Air Force personnel as the BUFF (Big Ugly Fat/Flying Fucker/Fellow). […]
Superior performance at high subsonic speeds and relatively low operating costs have kept the B-52 in service despite the advent of later aircraft, including the cancelled Mach 3 North American XB-70 Valkyrie, the variable-geometry Rockwell B-1B Lancer, and the stealthy Northrop Grumman B-2 Spirit. The B-52 marked its 50th anniversary of continuous service with its original operator in 2005 and after being upgraded between 2013 and 2015 it will serve into the 2040s.[N 1] […]
B-52 strikes were an important part of Operation Desert Storm. With about 1,620 sorties flown, B-52s delivered 40% of the weapons dropped by coalition forces while suffering only one non-combat aircraft loss, with several receiving minor damage from enemy action. […]
The USAF continues to rely on the B-52 because it remains an effective and economical heavy bomber, particularly in the type of missions that have been conducted since the end of the Cold War against nations that have limited air defense capabilities. The B-52 has the capacity to “loiter” for extended periods over (or even well outside) the battlefield, and deliver precision standoff and direct fire munitions. It has been a valuable asset in supporting ground operations during conflicts such as Operation Iraqi Freedom. The B-52 had the highest mission capable rate of the three types of heavy bombers operated by the USAF in 2001. The B-1 averaged a 53.7% ready rate and the Northrop Grumman B-2 Spirit achieved 30.3%, while the B-52 averaged 80.5% during the 2000–2001 period. The B-52’s $72,000 cost per hour of flight is more than the $63,000 for the B-1B but almost half of the $135,000 of the B-2.”
I’ll just repeat that: $72,000/hour of flight. And the B-2 is at $135,000/hour. War is expensive.
I’ve not had lectures for the last two weeks, but tomorrow the new semester starts.
Like last semester I’ll try to ‘work-blog’ some stuff along the way – hopefully I’ll do it more often than I did, but it’s hard to say if that’s realistic at this point.
I bought the only book I’m required to acquire this semester earlier today:
…and having had a brief look at it I’m already starting to wonder if it was even a good idea to take that course. I’ve been told it’s a very useful course, but I have a nagging suspicion that it may also be quite hard. Here are some of the reasons (click to view in a higher resolution):
I don’t think it’s particularly likely that I’ll cover stuff from that particular course in work-blogs, for perhaps obvious reasons. One problem is the math, wordpress doesn’t handle math very well. Another problem is that most readers would be unlikely to benefit much from such posts unless I were to spend a lot more time on them than I’d like to do. But it’s not my only course this semester. We’ll see how it goes.
“…it’s just a matter of estimating the hazard functions…”
Or something like that. The words in the post title the instructor actually said, but I believe his voice sort of trailed off as he finished the sentence. All the stuff above is from today’s lecture notes, click to enlarge. The quote is from the last part of the lecture, after he’d gone through that stuff.
In the last slide, it should “of course” be ‘Oaxaca Blinder decomposition’, rather than ‘Oaxaca-Bilder’.
What we’re covering right now in class is not something I’ll cover here in detail – it’s very technical stuff. A few excerpts from today’s lecture notes (click to view full size):
Stuff like this is why I actually get a bit annoyed by people who state that their impression is that economics is a relatively ‘soft’ science, and ask questions like ‘the math you guys make use of isn’t all that hard, is it?’ (I’ve been asked this question a few times in the past) It’s actually true that a lot of it isn’t – we spend a lot of time calculating derivatives and finding the signs of those derivatives and similar stuff. And economics is a reasonably heterogenous field, so surely there’s a lot of variation – for example, in Denmark business graduates often call themselves economists too even though a business graduates’ background, in terms of what we’ve learned during our education, would most often be reasonably different from e.g. my own.
What I’ll just say here is that the statistics stuff generally is not easy (if you think it is, you’ve spent way too little time on that stuff*). And yeah, the above excerpt is from what I consider my ‘easy course’ this semester – most of it is not like that, but some of it sure is.
Incidentally I should just comment in advance here, before people start talking about physics envy (mostly related to macro, IMO (and remember again the field heterogeneity; many, perhaps a majority of, economists don’t specialize in that stuff and don’t really know all that much about it…)), that the complexity economists deal with when they work with statistics – which is also economics – is the same kind of complexity that’s dealt with in all other subject areas where people need to analyze data to reach conclusions about what the data can tell us. Much of the complexity is in the data – the complexity relates to the fact that the real world is complex, and if we want to model it right and get results that make sense, we need to think very hard about which tools to use and how we use them. The economists who decide to work with that kind of stuff, more than they absolutely have to in order to get their degrees that is, are economists who are taught how to analyze data and do it the right way, and how what is the right way may depend upon what kind of data you’re working with and the questions you want to answer. This also involves learning what an Epanechnikov kernel is and what it implies that the error terms of a model are m-dependent.
(*…or (Plamus?) way too much time…)
i. Proportional hazards models. (work-related)
“Proportional hazards models are a class of survival models in statistics. Survival models relate the time that passes before some event occurs to one or more covariates that may be associated with that quantity. In a proportional hazards model, the unique effect of a unit increase in a covariate is multiplicative with respect to the hazard rate. For example, taking a drug may halve one’s hazard rate for a stroke occurring, or, changing the material from which a manufactured component is constructed may double its hazard rate for failure. Other types of survival models such as accelerated failure time models do not exhibit proportional hazards. These models could describe a situation such as a drug that reduces a subject’s immediate risk of having a stroke, but where there is no reduction in the hazard rate after one year for subjects who do not have a stroke in the first year of analysis.”
“A radioisotope thermoelectric generator (RTG, RITEG) is an electrical generator that obtains its power from radioactive decay. In such a device, the heat released by the decay of a suitable radioactive material is converted into electricity by the Seebeck effect using an array of thermocouples.
RTGs have been used as power sources in satellites, space probes and unmanned remote facilities, such as a series of lighthouses built by the former Soviet Union inside the Arctic Circle. RTGs are usually the most desirable power source for robotic or unmaintained situations needing a few hundred watts (or less) of power for durations too long for fuel cells, batteries, or generators to provide economically, and in places where solar cells are not practical. Safe use of RTGs requires containment of the radioisotopes long after the productive life of the unit. […]
In addition to spacecraft, the Soviet Union constructed many unmanned lighthouses and navigation beacons powered by RTGs. Powered by strontium-90 (90Sr), they are very reliable and provide a steady source of power. Critics[who?] argue that they could cause environmental and security problems as leakage or theft of the radioactive material could pass unnoticed for years, particularly as the locations of some of these lighthouses are no longer known due to poor record keeping. In one instance, the radioactive compartments were opened by a thief. In another case, three woodsmen in Georgia came across two ceramic RTG heat sources that had been stripped of their shielding. Two of the three were later hospitalized with severe radiation burns after carrying the sources on their backs. The units were eventually recovered and isolated.
There are approximately 1,000 such RTGs in Russia. All of them have long exhausted their 10-year engineered life spans. They are likely no longer functional, and may be in need of dismantling. Some of them have become the prey of metal hunters, who strip the RTGs’ metal casings, regardless of the risk of radioactive contamination.”
iii. List of unusual deaths. A lot of awesome stuff here. A few examples from the article:
- 1814: London Beer Flood, seven people were killed (some drowned, some died from injuries, and one succumbed to alcohol poisoning) when 323,000 imperial gallons (388,000 US gal; 1,468,000 L) of beer in the Meux and Company Brewery burst out of its vats and gushed into the streets.
- 1912: Franz Reichelt, tailor, fell to his death off the first deck of the Eiffel Tower while testing his invention, the overcoat parachute. It was his first ever attempt with the parachute.
- 1940: Marcus Garvey died due to two strokes after reading a negative premature obituary of himself.
- 1974: Basil Brown, a 48-year-old health food advocate from Croydon, drank himself to death with carrot juice.
- 2007: Jennifer Strange, a 28-year-old woman from Sacramento, California, died of water intoxication while trying to win a Nintendo Wii console in a KDND 107.9 “The End” radio station’s “Hold Your Wee for a Wii” contest, which involved drinking large quantities of water without urinating.
iv. Limnic eruption.
“A limnic eruption, also referred to as a lake overturn, is a rare type of natural disaster in which dissolved carbon dioxide (CO2) suddenly erupts from deep lake water, suffocating wildlife, livestock and humans. Such an eruption may also cause tsunamis in the lake as the rising CO2 displaces water. Scientists believe landslides, volcanic activity, or explosions can trigger such an eruption. Lakes in which such activity occurs may be known as limnically active lakes or exploding lakes.”
v. HeLa. The woman died more than 60 years ago, but some of the descendants of the cancer cells that killed her survives to this day:
“A HeLa cell /ˈhiːlɑː/, also Hela or hela cell, is a cell type in an immortal cell line used in scientific research. It is the oldest and most commonly used human cell line. The line was derived from cervical cancer cells taken on February 8, 1951 from Henrietta Lacks, a patient who eventually died of her cancer on October 4, 1951. The cell line was found to be remarkably durable and prolific as illustrated by its contamination of many other cell lines used in research. […]
HeLa cells, like other cell lines, are termed “immortal” in that they can divide an unlimited number of times in a laboratory cell culture plate as long as fundamental cell survival conditions are met (i.e. being maintained and sustained in a suitable environment). There are many strains of HeLa cells as they continue to evolve in cell cultures, but all HeLa cells are descended from the same tumor cells removed from Mrs. Lacks. It has been estimated that the total number of HeLa cells that have been propagated in cell culture far exceeds the total number of cells that were in Henrietta Lacks’s body. […]
HeLa cells were used by Jonas Salk to test the first polio vaccine in the 1950s. Since that time, HeLa cells have been used for “research into cancer, AIDS, the effects of radiation and toxic substances, gene mapping, and many other scientific pursuits”. According to author Rebecca Skloot, by 2009, “more than 60,000 scientific articles had been published about research done on HeLa, and that number was increasing steadily at a rate of more than 300 papers each month.””
The result of over 50 years of experiments in the Soviet Union and Russia, the breeding project was set up in 1959 by Soviet scientist Dmitri Belyaev. It continues today at The Institute of Cytology and Genetics at Novosibirsk, under the supervision of Lyudmila Trut. […]
Belyaev believed that the key factor selected for in the domestication of dogs was not size or reproduction, but behavior; specifically, amenability to domestication, or tameability. He selected for low flight distance, that is, the distance one can approach the animal before it runs away. Selecting this behavior mimics the natural selection that must have occurred in the ancestral past of dogs. More than any other quality, Belyaev believed, tameability must have determined how well an animal would adapt to life among humans. Since behavior is rooted in biology, selecting for tameness and against aggression means selecting for physiological changes in the systems that govern the body’s hormones and neurochemicals. Belyaev decided to test his theory by domesticating foxes; in particular, the silver fox, a dark color form of the red fox. He placed a population of them in the same process of domestication, and he decided to submit this population to strong selection pressure for inherent tameness.
The result is that Russian scientists now have a number of domesticated foxes that are fundamentally different in temperament and behavior from their wild forebears. Some important changes in physiology and morphology are now visible, such as mottled or spotted colored fur. Many scientists believe that these changes related to selection for tameness are caused by lower adrenaline production in the new breed, causing physiological changes in very few generations and thus yielding genetic combinations not present in the original species. This indicates that selection for tameness (i.e. low flight distance) produces changes that are also influential on the emergence of other “dog-like” traits, such as raised tail and coming into heat every six months rather than annually.”
vi. Attalus I (featured).
“Attalus I (Greek: Ἄτταλος), surnamed Soter (Greek: Σωτὴρ, “Savior”; 269 BC – 197 BC) ruled Pergamon, an Ionian Greek polis (what is now Bergama, Turkey), first as dynast, later as king, from 241 BC to 197 BC. He was the second cousin and the adoptive son of Eumenes I, whom he succeeded, and was the first of the Attalid dynasty to assume the title of king in 238 BC. He was the son of Attalus and his wife Antiochis.
Attalus won an important victory over the Galatians, newly arrived Celtic tribes from Thrace, who had been, for more than a generation, plundering and exacting tribute throughout most of Asia Minor without any serious check. This victory, celebrated by the triumphal monument at Pergamon (famous for its Dying Gaul) and the liberation from the Gallic “terror” which it represented, earned for Attalus the name of “Soter”, and the title of “king“. A courageous and capable general and loyal ally of Rome, he played a significant role in the first and second Macedonian Wars, waged against Philip V of Macedon. He conducted numerous naval operations, harassing Macedonian interests throughout the Aegean, winning honors, collecting spoils, and gaining for Pergamon possession of the Greek islands of Aegina during the first war, and Andros during the second, twice narrowly escaping capture at the hands of Philip.
Attalus was a protector of the Greek cities of Anatolia and viewed himself as the champion of Greeks against barbarians. During his reign he established Pergamon as a considerable power in the Greek East. He died in 197 BC, shortly before the end of the second war, at the age of 72, having suffered an apparent stroke while addressing a Boeotian war council some months before.”
“The East African Campaign was a series of battles and guerrilla actions which started in German East Africa and ultimately affected portions of Mozambique, Northern Rhodesia, British East Africa, Uganda, and the Belgian Congo. The campaign was effectively ended in November 1917. However, the Germans entered Portuguese East Africa and continued the campaign living off Portuguese supplies.
The strategy of the German colonial forces, led by Lieutenant Colonel (later Generalmajor) Paul Emil von Lettow-Vorbeck, was to drain and divert forces from the Western Front to Africa. His strategy failed to achieve these results after 1916, as mainly Indian and South African forces, which were prevented by colonial policy from deploying to Europe, conducted the rest of the campaign. […]
In this campaign, disease killed or incapacitated 30 men for every man killed in battle on the British side.”
viii. European bison (Wisent). I had never heard about those. Here’s what they look like:
“The European bison (Bison bonasus), also known as wisent ( /ˈviːzənt/ or /ˈwiːzənt/) or the European wood bison, is a Eurasian species of bison. It is the heaviest surviving wild land animal in Europe; a typical European bison is about 2.1 to 3.5 m (7 to 10 ft) long, not counting a tail of 30 to 60 cm (12 to 24 in) long, and 1.6 to 2 m (5 to 7 ft) tall. Weight typically can range from 300 to 920 kg (660 to 2,000 lb), with an occasional big bull to 1,000 kg (2,200 lb) or more. On average, it is slightly lighter in body mass and yet taller at the shoulder than the American bison (Bison bison). Compared to the American species, the Wisent has shorter hair on the neck, head and forequarters, but longer tail and horns.
European bison were hunted to extinction in the wild, with the last wild animals being shot in the Białowieża Forest in Eastern Poland in 1919 and in the Western Caucasus in 1927, but have since been reintroduced from captivity into several countries in Europe, all descendants of the Białowieża or lowland European bison. They are now forest-dwelling. They have few predators (besides humans), with only scattered reports from the 19th century of wolf and bear predation. […]
Historically, the lowland European bison’s range encompassed all lowlands of Europe, extending from the Massif Central to the Volga River and the Caucasus. It may have once lived in the Asiatic part of what is now the Russian Federation. Its range decreased as human populations expanded cutting down forests. The first population to be extirpated was that of Gaul in the 8th century AD. The European bison became extinct in southern Sweden in the 11th century, and southern England in the 12th. The species survived in the Ardennes and the Vosges until the 15th century. In the early middle ages, the wisent apparently still occurred in the forest steppes east of the Ural, in the Altay Mountains and seems to have reached Lake Baikal in the east. The northern boundary in the Holocene was probably around 60°N in Finland.
European bison survived in a few natural forests in Europe but its numbers dwindled. The last European bison in Transylvania died in 1790. In Poland, European bison in the Białowieża Forest were legally the property of the Polish kings until the Third partition of Poland. Wild European bison herds also existed in the forest until the mid-17th century. Polish kings took measures to protect the bison. King Sigismund II Augustus instituted the death penalty for poaching a European bison in Białowieża in the mid-16th century. In the early 19th century, Russian czars retained old Polish laws protecting the European bison herd in Białowieża. Despite these measures and others, the European bison population continued to decline over the following century, with only Białowieża and Northern Caucasus populations surviving into the 20th century.
During World War I, occupying German troops killed 600 of the European bison in the Białowieża Forest for sport, meat, hides, and horns. A German scientist informed army officers that the European bison were facing imminent extinction, but at the very end of the war, retreating German soldiers shot all but 9 animals. The last wild European bison in Poland was killed in 1919, and the last wild European bison in the world was killed by poachers in 1927 in the western Caucasus. By that year fewer than 50 remained, all in zoos.”
Mostly to make clear that even though low posting frequency often means that I feel less well than I sometimes do, this is not the reason for this last week’s lpf. I’m simply too busy to blog much or do stuff that’s blog-worthy. Didn’t really have a weekend this week at all.
Some random stuff/links:
2. How to mate with King vs King + 2 bishops:
3. Ever wondered what a Vickrey auction is and what the optimal bidding strategy in such an auction is? No? Now you know.
4. How long can people hold their breath under water? (and many other things. The answer of course is: ‘It depends…’)
Or a sample that’s arguably closer than yesterday’s to the kind of stuff I’m actually working with. The pics are from my textbook. Click to view in higher res.
In a couple of months, I’ll probably say that (‘stuff like this’) looks worse than it is. Some of it is quite a bit simpler than it looks, but in general I don’t feel that way right now. Even though we made some progress today there’s still a long way to go.
Stopped working half an hour ago, basically because I couldn’t think straight anymore, not because I wouldn’t like to keep working. On my way to bed. We’re in time trouble and I probably won’t do anything but work and sleep until Friday (not that I’ve been doing all that much else so far); anyway, don’t expect any updates until Friday evening or some time Saturday.
I’ve kept the links somewhat general in order not to give any hints to fellow students finding this blogpost via google (none of them relates to the breakthroughs mentioned below), but these links is a good sample of the kind of stuff I’ve been working with today: 1, 2, 3 (notice how big that file is. We frequently look up stuff here), 4, 5. I’ve chosen links with some degree of formalization, though most of them of course don’t go into all that much detail. Our curriculum in this course consists of a few hundred pages like those.
I’ve just parted ways with my study group (until tomorrow morning) after appr. 12 hours of (almost) completely uninterrupted work. Hopefully we just made two major breakthroughs. We work with (think about, manipulate, program with..) equations such as those in the links (and the related concepts) all the time and we’ve done it for days on end already.
This exam is very hard and I’m very tired. The tired part is not because of lack of sleep, that’s not an issue (yet). It’s because thinking is hard. Also, it’s depressing working with this stuff because I’m pretty sure that for a guy with an IQ of 150-160, most of this stuff is simply just a walk in the park. Right now I kinda feel like the stupid kid in primary school.
Roman Emperor from 98 AD to 117 AD. This is what the Roman Empire looked like at the end of his reign:
You can file this one under: ‘Yet more stuff I should have learned something about when I was younger.’ Before I started at the university, I learned a lot of the stuff the various schools I was enrolled in had to offer – but I didn’t learn much outside school. I really dislike now that I wasted so much time back then. I still do, btw., ie. waste a lot of time – old habits die hard but it’s better than it used to be. No, it’s not that I consider all the time that is spent not collecting knowledge like this wasted, no way; I just don’t have all that many better things to be doing with my time when I’m not doing the stuff I have to do, like studying the stuff that’s actually related to my exams, so my tradeoffs don’t look quite like those of a more ordinary person – who might have, say, a lot of what might be termed ‘social obligations’. I think of reading stuff like this as somehow more virtuous than reading tv-tropes or kibitzing a game of chess between two GMs and most certainly more virtuous than watching an episode of House, which I also happen to be doing every now and then.
Robin Lane Fox did include Trajan’s ruling period in his book but it’s been a while since I read that anyway and there wasn’t a lot of stuff about that guy in there. Here’s one sentence, perhaps not exactly displaying Trajan in the best possible light: “Between May 107 and November 109 Trajan celebrated his conquest of Dacia (modern Romania) with more than twenty weeks of blood sports, showing more than 5,500 pairs of gladiators and killing over 11,000 animals.” Though it should probably also be noted that such ‘blood sports’ were quite popular among the populace as well back then. (how much did I actually quote from that book here on the blog back when I’d read it? I now think perhaps my coverage of the book back then was somewhat lacking, perhaps I should have included more stuff? Well, it’s not too late, if I get ’round to it, maybe..).
2. Ants. File under: ‘These guys are pretty amazing’. There are more than four times as many estimated ant species (22.000) as there are species of mammals combined (5.400) – more than 12.500 ant species have already been classified. They’ve been around for more than 100 million years:
“Ants evolved from a lineage within the vespoid wasps. Phylogenetic analysis suggests that ants arose in the mid-Cretaceous period about 110 to 130 million years ago. After the rise of flowering plants about 100 million years ago they diversified and assumed ecological dominance around 60 million years ago.”
According to one of the source articles to the article:
“Ants are arguably the greatest success story in the history of terrestrial metazoa. On average, ants monopolize 15–20% of the terrestrial animal biomass, and in tropical regions where ants are especially abundant, they monopolize 25% or more.”
4. Autoregressive model. ‘The type of stuff people like me work with on a near-daily basis’. [‘economics? That’s a bit like philosophy, right?’ – I got that comment once not long ago out in the Real World. In some ways it kinda is, sort of, or there are at least some elements the two systems have in common within relevant subsystems; but if you actually ask a question like that the answer will always be ‘No’.]
5. International Space Station. A featured article. Some stats:
Mass: 369,914 kg
Length: 51 m
Width: 109 m
“The cost of the station has been estimated by ESA as €100 billion over 30 years, and, although estimates range from 35 to 160 billion US dollars, the ISS is believed to be the most expensive object ever constructed.”
The link  in the article states that: “The European share, at around 8 billion Euros spread over the whole programme, amounts to just one Euro spent by every European every year…”
One of the great benefits of experimental research is that, in principle, we can repeat the experiment and generate a fresh set of data. While this is impossible for many questions in social science, at a minimum one would hope that we could replicate our original results using the same dataset. As many students in Gov 2001 can tell you, however, social science often fails to clear even that low bar.
Of course, even this type of replication is impossible if someone else has changed the dataset since the original analysis was conducted. But that would never happen, right?