(link). Some people would say that you should formulate the hypothesis before you start gathering data – and that’s what I’ll do now.
I guess this post is mostly for people like Plamus, but other people are very welcome to read along as well. I’ll start out with some introductionary remarks. I have an account on a chess website – playchess.com. It’s a neat site, I like it. They’ve recently introduced a new featured: A so-called ‘tactics trainer’. The way the tactics trainer works is by means of tactics sessions. Each tactics session features a number of chess problems you need to solve under a time constraint. You’ll never run out of problems; each time you’ve solved one problem (or answered incorrectly) a new one will pop up. Each session lasts about 6 minutes – some problems can be solved in a second or two, others might take more than a minute. The outcome of a session will depend upon the number of problems solved correctly, the ‘toughness’ of the problems solved or not solved and probably various other factors as well. Once you’ve finished a session, you’ll get a statistic on the number of correctly and incorrectly solved problems, the average time spent on each problem and the corresponding tactics performance rating. The performance rating will impact your combined tactics rating, which is a result of all previous sessions (it’s like a standard Elo rating system with frequent updating).
But why is the tactics trainer worth blogging about? Well, here’s the thing: Solving tactics problems is hard and it’s a cognitively demanding task. It takes brain power, and if your brain isn’t working 100 % you’ll do worse than if it did. I have often thought about how to model the effects of blood glucose variations on cognitive performance. I’ve thought about it because I know that blood glucose variation impacts my performance in various areas – it’s obviously the case, in extreme cases it’s extremely obvious. But what about the non-extreme cases? Blood glucose fluctuates a lot over the course of a day, and it’s not unlikely that such fluctuations also impact performance. But can those effects be quantified? So far it’s been difficult for me to figure out how one would set about doing that – one approach I’ve contemplated in the past was to use IQ-tests to measure performance as a function of blood glucose, but that idea was basically dead in the water in terms of getting the kind of results I’d like – an IQ-test takes a lot of time, it’s not always easy to compare scores across tests and you can’t do the same test over and over because the way the test is designed the validity of the results will be impacted if you repeat the test. Another problem is that the blood glucose level wouldn’t even be exogenous – to be in a state of deep concentration for a long time under stressful circumstances impacts blood glucose. What would be much better would be a shorter version of the test – like a relatively short test where a high level of concentration is required to perform well and where even small differences in performances as a result of blood glucose fluctuations can be measured and quantified. Remember the tactics trainer I was talking about? Yeah…
It seemed to me that using the tactics trainer sessions to gauge ‘mental ability’ as a function of blood glucose actually makes a lot of sense; it’s possible to run a lot of sessions over time, so n can potentially become large enough to actually make room for some non-silly results. There are always new and different problems available, and the comparability issue across tests disappears completely. Blood glucose values can be taken as exogenous as the sessions last only a very short amount of time. Performances are precisely measured.
I should make it clear from the start that the effect of blood glucose on performance is non-linear. Extremely low values impact performance, as do extremely high values – so in theory some kind of semi-inverse-u-shaped pattern should probably be expected. The actual relationship would not look very much like an inverse u both because the scales are asymmetric in terms of symptoms/(mmol/l deviation from the desired level) – a blood glucose of somewhere between 4-10 mmol/l is often considered ‘desirable’, but whereas a value of 0 will mean that you’re dead, a value of 14 will for many diabetics probably often not give any symptoms at all – and because the left hand side is truncated (as mentioned) whereas in practice the right hand side is not for well-treated patients.
I will make a simplifying assumption here that will save me a lot of work and arguably will not be all that problematic when interpreting the results. I’ll disregard the non-linearities in the data by removing all data problems related to performance effects to the left of the lower bound of the ‘desirable level’, and by assuming that the ‘true’ non-linear relationship between performance and blood glucose on the right hand side of the distribution can be approximated by a linear function without this causing too many problems. The way to deal with the “data problems related to performance effects to the left of the lower bound of the ‘desirable level'” will be to exclude from the sample all observations with a measured blood glucose below 4.0 mmol/l. My motivation for removing the lowest values is that it will always become obvious to me within a very short amount of time, when my blood glucose is that low, that there’s a significant performance effect. I know those effects very well, and I know that it’s a bad idea to delay treatment – blood glucose levels below that can quickly turn into a medical emergency. When thinking about performance-effects here, it seems to me to make a lot of sense to implicitly employ a two-state model framework and then use separate/different models to analyze the stuff that’s going on in the two states: State one is quite simple, that’s the hypoglycemia scenario mentioned. To ‘model’ this state is easy: The effects are almost universally real and significant, to an extent where even measuring them in the manner described here becomes borderline dangerous. State two: Euglycemia or hyperglycemia. In this state, performance is likely to be at least somewhat some to linearly decreasing in the blood glucose level. I’m mostly interested in performance effects which are not obvious to me and so that makes state two the more interesting state to consider; it’s also a lot more interesting because state one is relatively (though not that…) rare, whereas state two is the default state in which I spend most of my time. Regarding using a linear approximation to model the relationship in state two rather than the ‘true’ non-linear function: This may be problematic, but I know myself well enough to know that I don’t want to bother with non-linear models when I look at this stuff later; it’s a poor and underspecified model to begin with. The kind of question I’m asking here is far more along the lines of: ‘does it even make sense to assume that your cognitive profile is affected by blood glucose variation?’ than it is a question along the lines of: ‘how will a 2,6 mmol/l difference impact your likelihood of getting an A when taking an exam in course X?’
When it comes to the specifics of the data gathering process, I’ll do it this way: Unless I have symptoms of hypoglycemia – in which case I’ll not do the session in question, but rather treat the hypoglycemia – I’ll only measure the blood glucose after I’ve finished the session. If the blood glucose is below 4.0 mmol/l the results will not be included in the sample. For all other observations, I will list the performance rating of the tactics session and the blood glucose level.
I intend to test the hypothesis that there is a significant and negative effect on performance of the blood glucose level measured (higher blood glucose level -> lower performance rating).
If I get around to it, it might also be interesting to see if there are threshold effects at play. One threshold to consider might be a blood glucose level of 15.0 mmol/l.. The precise cut-off is semi-arbitrary, but not completely; this is close to the point where you start to be able to measure beginning ketonuria, and it’s probably also around this point where symptoms start to (maybe) appear. I write ‘maybe’ because the symptoms of high blood glucose are far more unreliable than the symptoms of low blood glucose, which is also why I’m interested in the related performance effects; when I have symptoms I know I’m not ‘at my best’, but diabetics are often not ‘at their best’ without getting any signals from the body to that effect. A threshold effect also makes sense to include because it’s far from likely that a linear model will catch all the stuff that’s going on here.
As a starting point, my stopping rule will be that I’ll stop collecting data once I have 300 observations. This is completely arbitrary, but you should always have a stopping rule. I take in the neighbourhood of 8 blood tests a day and some of them aren’t taken when I sit at my computer doing tactics chess exercises. If half of them are, however, I will have 300 observations in 2,5 months, i.e. around New Year (this is close to my exams, so I’ll surely not want to do a lot of non-work statistical modelling at that point – so it will be kept simple..). Maybe it will be worth considering doing more than one session per blood test in which case the data can be gathered a lot faster than that, but then problems related to blood glucose exogeneity may start to pop up. I haven’t done multiple sessions after each other before, so I don’t know if such an approach will impact the performance rating; it might, and if it seems to do that I’ll probably disregard such ‘shortcuts’.
Potentially I might improve my tactics abilities during the survey period (in this specific setting that would be a bad thing, because the parameters would then no longer be constant over time) but unless such an effect is very noticeable early on I’ll proceed as if my skill does not improve during the survey period. I’ll write down the starting tactics rating (which is sort of ‘an average of recent past performances’) as well as the tactics rating at the end of the project and compare the difference between the two with the estimated standard deviation of the observations to at least get an idea if there’s a potential big problem here; I don’t know if I’ll really care if a big problem turns up, but I should at least pretend to care about this ‘risk’ of getting better over time (and as an added bonus this is also a simple way to try to establish if doing tactics exercises helps you improve your tactics abilities significantly). The reason why I assume the ‘improvement over time’-effect to be minor here is mostly that I’m actually a reasonably strong player by now so the learning curve is presumably a lot flatter than it was in the past, meaning that exercises like these should not be expected to have that big an effect on my performance.
Yes, I did consider including other variables in the model (number of unsolved problems, time spent/problem), but a) they don’t add much additional information, b) they’re strongly correlated with the rating variable (so I would not be comfortable including them in the same model as the rating variable), and c) the more data I need to write down the more this will feel like work, and I don’t want it to feel like work. So there’ll also be no controls included, this is all just a ‘fun (not quick) and dirty’ project to have running for a while. I’ll release the (limited) data afterwards and let people play around with it if they like to.
Ideas and suggestions (which do not involve me doing a lot of extra work), as well as questions, are of course most welcome.
Incidentally, if you want to know if you’re good at figuring out how smart people are based on how they look, here’s another small-scale project you may be interested in (I have nothing to do with it as such, but I know the guy behind it).