(No, not that type of modelling! – I was rather thinking about the type below…)
Anyway, I assume not all readers are equally familiar with this stuff, which I’ve incidentally written about before e.g. here. Some of you will know all this stuff already and you do not need to read on (well, maybe you do – in order to realize that you do not..). Some of it is recap, some of it I don’t think I’ve written about before. Anyway.
i. So, a model is a representation of the world. It’s a simplified version of it, which helps us think about the matters at hand.
ii. Models always have a lot of assumptions. A perhaps surprising observation is that, from a certain point of view, models which might be categorized as more ‘simple’ (few explicit assumptions) can be said to make as many assumptions as do more ‘complex’ models (many explicit assumptions); it’s just that the underlying assumptions are different. To illustate this, let’s have a look at two different models, model 1 and model 2. Model 1 is a model which states that ‘Y = aX’. Model 2 is a model which states that ‘Y = aX + bZ’.
Model 1 assumes b is equal to 0 so that Z is not a relevant variable to include, whereas model 2 assumes b is not zero – but both models make assumptions about this variable ‘Z’ (and the parameter ‘b’). Models will often differ along such lines, making different assumptions about variables and how they interact (incidentally here we’re implicitly assuming in both models that X and Z are independent). A ‘simple’ model does make fewer (explicit) assumptions about the world than does a ‘complex’ model – but that question is different from the question of which restrictions the two models impose on the data. And thinking in binary terms when we ask ourselves the question, ‘Are we making an assumption about this variable or this relationship?’, then the answer will always be ‘yes’ either way. Does the variable Z contribute information relevant to Y? Does it interact with other variables in the model? Both the simple model and the complex model include assumptions about this stuff. At every branching point where the complex model departs from the simple one, you have one assumption in one model (‘the distinction between f and g matters’, ‘alpha is non-zero’) and another assumption in the other (‘the distinction between f and g doesn’t matter’, ‘alpha is zero’). You always make assumptions, it’s just that the assumptions are different. In simple models assumptions are often not spelled out, which is presumably part of why some of the assumptions made in such models are easy to overlook; it makes sense that they’re not, incidentally, because there’s an infinite number of ways to make adjustments to a model. It’s true that branching out does take place in some complex models in ways that do not occur in simple models, and once you’re more than one branching point away from the departure point where the two models first differ then the behaviour of the complex model may start to be determined by additional new assumptions where on the other hand the behaviour of the simple model might still rely on the same assumption that determined the behaviour at the first departure point – so the number of explicit assumptions will be different, but an assumption is made in either case at every junction.
As might be inferred from the comments above usually ‘the simple model’ will be the one with the more restrictive assumptions, in terms of what the data is ‘allowed’ to do. Fewer assumptions usually means stronger assumptions. It’s a much stronger assumption to assume that e.g. males and females are identical than is the alternative that they are not; there are many ways they could be not identical but only one way in which they can be. The restrictiveness of a model does not equal the number of assumptions (explicitly) made. No, on a general note it is rather the case that more assumptions mean that your model becomes less restrictive, because additional assumptions allow for more stuff to vary – this is indeed a big part of why model-builders generally don’t just stick to very simple models; if you do that, you don’t get the details right. Adding more assumptions may allow you to make a more correct model that better explains the data. It is my experience (not that I have much of it, but..) that people who’re unfamiliar with modelling think of additional assumptions as somehow ‘problematic’ – ‘more stuff can go wrong if you add more assumptions, the more assumptions you have the more likely it is that one of them is violated’. The problem is that not making assumptions is not really an option; you’ll basically assume something no matter what you do. ‘That variable/distinction/connection is irrelevant’, which is often the default assumption, is also just that – an assumption. If you do modelling you don’t ever get to not make assumptions, they’re always there lurking in the background whether you like it or not.
iii. A big problem is that we don’t know a priori which assumptions are correct before we’ve actually tested the models – indeed, we often make models mainly in order to figure out which assumptions are correct. (Sometimes we can’t even test the assumptions we’re making in a model, but let’s ignore this problem here…). A more complex model may not always be more correct, perform better. Sometimes it’ll actually do a worse job at explaining the variation in the data than a simple one would have done. When you add more variables to a model, you also add more uncertainty because of things like measurement error. Sometimes it’s worth it, because the new variable explain a lot of the variation in the data. Sometimes it’s not – sometimes the noise you add is far more relevant than is the additional information contribution about how the data behaves.
There are various ways to try to figure out if the amount of noise added from an additional variable is too high for it to be a good idea to include the variable in a model, but they’re not perfect and you always have tradeoffs. There are many different methods to estimate which model performs better, and the different methods apply different criteria – so you can easily get into a situation where the choice of which variable to include in your ‘best model’ depends on e.g. which information criterium you choose to apply.
Anyway the key point is this: You can’t just add everything (all possible variables you could imagine play a role) and assume you’ll be able to explain everything that way – adding another variable may indeed sometimes be a very bad idea.
iv. If you test a lot of hypotheses simultaneously, which all have some positive probability of being evaluated as correct, then as you add more variables to your model it becomes more and more likely that at least one of those hypotheses will be evaluated as being correct (relevant link) unless you somehow adjust the probability of a given hypothesis being evaluated as correct as you add more hypotheses along the way. This is another reason adding more variables to a model can sometimes be problematic. There are ways around this particular problem, but if they are not used, which they often are not, then you need to be careful.
v. Adding more variables is not always preferable, but then what about throwing more data at the problem by adding to the sample size? Surely if you add more data to the sample that should increase your confidence in the model results, right? Well… No – bigger is actually not always better. This is related to the concept of consistency in statistics. “A consistent estimator is one for which, when the estimate is considered as a random variable indexed by the number n of items in the data set, as n increases the estimates converge to the value that the estimator is designed to estimate,” as the wiki article puts it. You can imagine that consistency is one of the key assumptions underlying statistical models – it really is, we care a lot about consistency, and all else equal you should always prefer a consistent estimator to an inconsistent one (however it should be noted that all else is not always equal; a consistent estimator may have larger variance than an inconsistent estimator in a finite sample, which means that we may actually sometimes prefer the latter to the former in specific situations). But the thing is, not all estimators are consistent. There are always some critical assumptions which need to be satisfied in order for the consistency requirement to be met, and in a bad model these requirements will not be met. If you have a bad model, for example if you’ve incorrectly specified the relationships between the variables or included the wrong variables in your model, then increasing the sample size will do nothing to help you – additional data will not somehow magically make the estimates more reliable ‘because of asymptotics’. In fact if your model’s performance is very sensitive to the sample size to which you apply it, it may well indicate that there’s a problem with the model, i.e. that the model is misspecified (see e.g. this).
vi. Not all model assumptions are equal – some assumptions will usually be much more critical than others. As already mentioned consistency of regressors is very important, and here it is important to note that not all model assumption violations will lead to inconsistent estimators. An example of where this is not the case is the homoskedasticity assumption (see also this) in regression analysis. Here you can actually find yourself in a situation where you deliberately apply a model where you know that one of your assumptions about how the data behaves is violated, yet this is not a problem at all because you can deal with the problem separately so that that violation is of no practical importance as you can correct for it. As already mentioned in the beginning most models will be simplified versions of the stuff that goes on in the real world, so you’ll expect to see some ‘violations’ here and there – the key question to ask here is then, is the violation important and which consequences does it have for the estimates we’ve obtained? If you do not ask yourself such questions when evaluating a model, you may easily end up quibbling about details which are of no importance anyway because they don’t really matter. And remember that all the assumptions made in the model are not always spelled out, and that some of the important ones may have been overlooked.
vii. Which causal inferences to make from the model? Correlation != causation. To some extent the question to which extent the statistical link is causal relates to questions pertaining to whether we’ve picked the right variables and the right way to relate them to each other. But as I’ve remarked upon before some model types are better suited for establishing causal links than are others – there are good ways and bad ways to get at the heart of the matter (one application here, I believe I’ve linked to this before). Different fields will often have developed different approaches, see e.g. this, this and this. Correlation on its own will probably tell you next to nothing about anything you might be interested in; as I believe my stats prof put it last semester, ‘we don’t care about correlation, correlation means nothing’. Randomization schemes with treatment groups and control groups are great. If we can’t do those, we can still try to make models to get around the problems. Those models make assumptions, but so do the other models you’re comparing them with and in order to properly evaluate them you need to be explicit about the assumptions made by the competing models as well.
It takes way more time to cover this stuff in detail here than I’m willing to spend on it, but here are a few relevant links to stuff I’m working on/with at the moment:
iii. Kolmogorov–Smirnov test.
iv. Chow test.
vi. Education and health: Evaluating Theories and Evidence, by Cutler & Muney.
vii. Education, Health and Mortality: Evidence from a Social Experiment, by Meghir, Palme & Simeonova.
i. Econometric methods for causal evaluation of education policies and practices: a non-technical guide. This one is ‘work-related’; in one of my courses I’m writing a paper and this working paper is one (of many) of the sources I’m planning on using. Most of the papers I work with are unfortunately not freely available online, which is part of why I haven’t linked to them here on the blog.
I should note that there are no equations in this paper, so you should focus on the words ‘a non-technical guide’ rather than the words ‘econometric methods’ in the title – I think this is a very readable paper for the non-expert as well. I should of course also note that I have worked with most of these methods in a lot more detail, and that without the math it’s very hard to understand the details and really know what’s going on e.g. when applying such methods – or related methods such as IV methods on panel data, a topic which was covered in another class just a few weeks ago but which is not covered in this paper.
This is a place to start if you want to know something about applied econometric methods, particularly if you want to know how they’re used in the field of educational economics, and especially if you don’t have a strong background in stats or math. It should be noted that some of the methods covered see wide-spread use in other areas of economics as well; IV is widely used, and the difference-in-differences estimator have seen a lot of applications in health economics.
ii. Regulating the Way to Obesity: Unintended Consequences of Limiting Sugary Drink Sizes. The law of unintended consequences strikes again.
You could argue with some of the assumptions made here (e.g. that prices (/oz) remain constant) but I’m not sure the findings are that sensitive to that assumption, and without an explicit model of the pricing mechanism at work it’s mostly guesswork anyway.
iii. A discussion about the neurobiology of memory. Razib Khan posted a short part of the video recently, so I decided to watch it today. A few relevant wikipedia links: Memory, Dead reckoning, Hebbian theory, Caenorhabditis elegans. I’m skeptical, but I agree with one commenter who put it this way: “I know darn well I’m too ignorant to decide whether Randy is possibly right, or almost certainly wrong — yet I found this interesting all the way through.” I also agree with another commenter who mentioned that it’d have been useful for Gallistel to go into details about the differences between short term and long term memory and how these differences relate to the problem at hand.
“An extensive body of prior research indicates an association between emotion and moral judgment. In the present study, we characterized the predictive power of specific aspects of emotional processing (e.g., empathic concern versus personal distress) for different kinds of moral responders (e.g., utilitarian versus non-utilitarian). Across three large independent participant samples, using three distinct pairs of moral scenarios, we observed a highly specific and consistent pattern of effects. First, moral judgment was uniquely associated with a measure of empathy but unrelated to any of the demographic or cultural variables tested, including age, gender, education, as well as differences in “moral knowledge” and religiosity. Second, within the complex domain of empathy, utilitarian judgment was consistently predicted only by empathic concern, an emotional component of empathic responding. In particular, participants who consistently delivered utilitarian responses for both personal and impersonal dilemmas showed significantly reduced empathic concern, relative to participants who delivered non-utilitarian responses for one or both dilemmas. By contrast, participants who consistently delivered non-utilitarian responses on both dilemmas did not score especially high on empathic concern or any other aspect of empathic responding.”
In case you were wondering, the difference hasn’t got anything to do with a difference in the ability to ‘see things from the other guy’s point of view’: “the current study demonstrates that utilitarian responders may be as capable at perspective taking as non-utilitarian responders. As such, utilitarian moral judgment appears to be specifically associated with a diminished affective reactivity to the emotions of others (empathic concern) that is independent of one’s ability for perspective taking”.
On a small sidenote, I’m not really sure I get the authors at all – one of the questions they ask in the paper’s last part is whether ‘utilitarians are simply antisocial?’ This is such a stupid way to frame this I don’t even know how to begin to respond; I mean, utilitarians make better decisions that save more lives, and that’s consistent with them being antisocial? I should think the ‘social’ thing to do would be to save as many lives as possible. Dead people aren’t very social, and when your actions cause more people to die they also decrease the scope for future social interaction.
v. Lastly, some Khan Academy videos:
(This one may be very hard to understand if you haven’t covered this stuff before, but I figured I might as well post it here. If you don’t know e.g. what myosin and actin is you probably won’t get much out of this video. If you don’t watch it, this part of what’s covered is probably the most important part to take away from it.)
It’s been a long time since I checked out the Brit Cruise information theory playlist, and I was happy to learn that he’s updated it and added some more stuff. I like the way he combines historical stuff with a ‘how does it actually work, and how did people realize that’s how it works’ approach – learning how people figured out stuff is to me sometimes just as fascinating as learning what they figured out:
(Relevant wikipedia links: Leyden jar, Electrostatic generator, Semaphore line. Cruise’ play with the cat and the amber may look funny, but there’s a point to it: “The Greek word for amber is ηλεκτρον (“elektron”) and is the origin of the word “electricity”.” – from the first link).
I haven’t really work-blogged anything substantial this semester so far and I’ve felt a bit guilty about that. Today on my way home from lectures I decided that one thing I could do, which wouldn’t take a lot of work on my part, was to just upload my notes taken during a lecture.
The stuff uploaded below is one and a half hour (2 lectures, each lasting 45 minutes) of my life, roughly. It wasn’t the complete lecture as the lecturer also briefly went through an example of how to do the specific maximum likelihood estimation and how to perform the Smith-Blundell procedure on a data set in a statistical program called Stata. On the other hand it’s more than 2 hours of my life because I also had to prepare for the lecture…
I know that people who’re not super familiar with mathematical models generally tend to assume that ‘the level of complexity’ dealt with in mathematical expressions is somehow positively correlated with (‘and thus causally linked to…’) the ‘amount of algebra’ (‘long equations with lots of terms are more complicated and involves more advanced math than short equations with few terms’). In general that’s not how it works. The stuff covered during the lecture was corner solution response models with neglected heterogeneity and endogenous variables; it may look simple as there’s a lot of of ‘a+b type stuff’, but you need to think hard to get things right and even simple-looking steps may cause problems when you’re preparing for exams in a course like this. Non-linear models with unobserved variables isn’t what you start out with when you learn statistics, but on the other hand this was hardly the most technical lecture I’ve had so I figured it sort of made sense to upload this; I added quite a few comments to the equations written on the blackboard which should make stuff easier to follow.
Anyway I figured at least one or two of you might find it interesting to ‘have a look inside the classroom’ (you can click the images to view them in a higher resolution):
I’ve not had lectures for the last two weeks, but tomorrow the new semester starts.
Like last semester I’ll try to ‘work-blog’ some stuff along the way – hopefully I’ll do it more often than I did, but it’s hard to say if that’s realistic at this point.
I bought the only book I’m required to acquire this semester earlier today:
…and having had a brief look at it I’m already starting to wonder if it was even a good idea to take that course. I’ve been told it’s a very useful course, but I have a nagging suspicion that it may also be quite hard. Here are some of the reasons (click to view in a higher resolution):
I don’t think it’s particularly likely that I’ll cover stuff from that particular course in work-blogs, for perhaps obvious reasons. One problem is the math, wordpress doesn’t handle math very well. Another problem is that most readers would be unlikely to benefit much from such posts unless I were to spend a lot more time on them than I’d like to do. But it’s not my only course this semester. We’ll see how it goes.
“…it’s just a matter of estimating the hazard functions…”
Or something like that. The words in the post title the instructor actually said, but I believe his voice sort of trailed off as he finished the sentence. All the stuff above is from today’s lecture notes, click to enlarge. The quote is from the last part of the lecture, after he’d gone through that stuff.
In the last slide, it should “of course” be ‘Oaxaca Blinder decomposition’, rather than ‘Oaxaca-Bilder’.
What we’re covering right now in class is not something I’ll cover here in detail – it’s very technical stuff. A few excerpts from today’s lecture notes (click to view full size):
Stuff like this is why I actually get a bit annoyed by people who state that their impression is that economics is a relatively ‘soft’ science, and ask questions like ‘the math you guys make use of isn’t all that hard, is it?’ (I’ve been asked this question a few times in the past) It’s actually true that a lot of it isn’t – we spend a lot of time calculating derivatives and finding the signs of those derivatives and similar stuff. And economics is a reasonably heterogenous field, so surely there’s a lot of variation – for example, in Denmark business graduates often call themselves economists too even though a business graduates’ background, in terms of what we’ve learned during our education, would most often be reasonably different from e.g. my own.
What I’ll just say here is that the statistics stuff generally is not easy (if you think it is, you’ve spent way too little time on that stuff*). And yeah, the above excerpt is from what I consider my ‘easy course’ this semester – most of it is not like that, but some of it sure is.
Incidentally I should just comment in advance here, before people start talking about physics envy (mostly related to macro, IMO (and remember again the field heterogeneity; many, perhaps a majority of, economists don’t specialize in that stuff and don’t really know all that much about it…)), that the complexity economists deal with when they work with statistics – which is also economics – is the same kind of complexity that’s dealt with in all other subject areas where people need to analyze data to reach conclusions about what the data can tell us. Much of the complexity is in the data – the complexity relates to the fact that the real world is complex, and if we want to model it right and get results that make sense, we need to think very hard about which tools to use and how we use them. The economists who decide to work with that kind of stuff, more than they absolutely have to in order to get their degrees that is, are economists who are taught how to analyze data and do it the right way, and how what is the right way may depend upon what kind of data you’re working with and the questions you want to answer. This also involves learning what an Epanechnikov kernel is and what it implies that the error terms of a model are m-dependent.
(*…or (Plamus?) way too much time…)
Or a sample that’s arguably closer than yesterday’s to the kind of stuff I’m actually working with. The pics are from my textbook. Click to view in higher res.
In a couple of months, I’ll probably say that (‘stuff like this’) looks worse than it is. Some of it is quite a bit simpler than it looks, but in general I don’t feel that way right now. Even though we made some progress today there’s still a long way to go.
Stopped working half an hour ago, basically because I couldn’t think straight anymore, not because I wouldn’t like to keep working. On my way to bed. We’re in time trouble and I probably won’t do anything but work and sleep until Friday (not that I’ve been doing all that much else so far); anyway, don’t expect any updates until Friday evening or some time Saturday.
One of the great benefits of experimental research is that, in principle, we can repeat the experiment and generate a fresh set of data. While this is impossible for many questions in social science, at a minimum one would hope that we could replicate our original results using the same dataset. As many students in Gov 2001 can tell you, however, social science often fails to clear even that low bar.
Of course, even this type of replication is impossible if someone else has changed the dataset since the original analysis was conducted. But that would never happen, right?
- 180 grader
- alfred brendel
- Arthur Conan Doyle
- Bent Jensen
- Bill Bryson
- Bill Watterson
- Claude Berri
- current affairs
- Dan Simmons
- David Copperfield
- david lynch
- den kolde krig
- Dinu Lipatti
- Douglas Adams
- economic history
- Edward Grieg
- Eliezer Yudkowsky
- Ezra Levant
- Filippo Pacini
- financial regulation
- Flemming Rose
- foreign aid
- Franz Kafka
- freedom of speech
- Friedrich von Flotow
- Fyodor Dostoevsky
- Game theory
- Garry Kasparov
- George Carlin
- george enescu
- global warming
- Grahame Clark
- harry potter
- health care
- isaac asimov
- Jane Austen
- John Stuart Mill
- Jon Stewart
- Joseph Heller
- karl popper
- Khan Academy
- knowledge sharing
- Leland Yeager
- Marcel Pagnol
- Maria João Pires
- Mark Twain
- Martin Amis
- Martin Paldam
- mikhail gorbatjov
- Mikkel Plum
- Morten Uhrskov Jensen
- Muzio Clementi
- Nikolai Medtner
- North Korea
- nuclear proliferation
- nuclear weapons
- Ole Vagn Christensen
- Oscar Wilde
- Pascal's Wager
- Paul Graham
- people are strange
- public choice
- rambling nonsense
- random stuff
- Richard Dawkins
- Rowan Atkinson
- Saudi Arabia
- science fiction
- Sun Tzu
- Terry Pratchett
- The Art of War
- Thomas Hobbes
- Thomas More
- walter gieseking
- William Easterly