“This book is written to provide […] a useful balance of theoretical treatment, description of empirical analyses and breadth of content for use in undergraduate modules in health economics for economics students, and for students taking a health economics module as part of their postgraduate training. Although we are writing from a UK perspective, we have attempted to make the book as relevant internationally as possible by drawing on examples, case studies and boxed highlights, not just from the UK, but from a wide range of countries”
I’m currently reading this book. The coverage has been somewhat disappointing because it’s mostly an undergraduate text which has so far mainly been covering concepts and ideas I’m already familiar with, but it’s not terrible – just okay-ish. I have added some observations from the first half of the book below.
“Health economics is the application of economic theory, models and empirical techniques to the analysis of decision making by people, health care providers and governments with respect to health and health care. […] Health economics has evolved into a highly specialised field, drawing on related disciplines including epidemiology, statistics, psychology, sociology, operations research and mathematics […] health economics is not shorthand for health care economics. […] Health economics studies not only the provision of health care, but also how this impacts on patients’ health. Other means by which health can be improved are also of interest, as are the determinants of ill-health. Health economics studies not only how health care affects population health, but also the effects of education, housing, unemployment and lifestyles.”
“Economic analyses have been used to explain the rise in obesity. […] The studies show that reasons for the rise in obesity include: *Technological innovation in food production and transportation that has reduced the cost of food preparation […] *Agricultural innovation and falling food prices that has led to an expansion in food supply […] *A decline in physical activity, both at home and at work […] *An increase in the number of fast-food outlets, resulting in changes to the relative prices of meals […]. *A reduction in the prevalence of smoking, which leads to increases in weight (Chou et al., 2004).”
“[T]he evidence is that ageing is in reality a relatively small factor in rising health care costs. The popular view is known as the ‘expansion of morbidity’ hypothesis. Gruenberg (1977) suggested that the decline in mortality that has led to an increase in the number of older people is because fewer people die from illnesses that they have, rather than because disease incidence and prevalence are lower. Lower mortality is therefore accompanied by greater morbidity and disability. However, Fries (1980) suggested an alternative hypothesis, ‘compression of morbidity’. Lower mortality rates are due to better health amongst the population, so people not only live longer, they are in better health when old. […] Zweifel et al. (1999) examined the hypothesis that the main determinant of high health care costs amongst older people is not the time since they were born, but the time until they die. Their results, confirmed by many subsequent studies, is that proximity to death does indeed explain higher health care costs better than age per se. Seshamani and Gray (2004) estimated that in the UK this is a factor up to 15 years before death, and annual costs increase tenfold during the last 5 years of life. The consensus is that ageing per se contributes little to the continuing rise in health expenditures that all countries face. Much more important drivers are improved quality of care, access to care, and more expensive new technology.”
“The difference between AC [average cost] and MC [marginal cost] is very important in applied health economics. Very often data are available on the average cost of health care services but not on their marginal cost. However, using average costs as if they were marginal costs may mislead. For example, hospital costs will be reduced by schemes that allow some patients to be treated in the community rather than being admitted. Given data on total costs of inpatient stays, it is possible to calculate an average cost per patient. It is tempting to conclude that avoiding an admission will reduce costs by that amount. However, the average includes patients with different levels of illness severity, and the more severe the illness the more costly they will be to treat. Less severely ill patients are most likely to be suitable for treatment in the community, so MC will be lower than AC. Such schemes will therefore produce a lower cost reduction than the estimate of AC suggests.
A problem with multi-product cost functions is that it is not possible to define meaningfully what the AC of a particular product is. If different products share some inputs, the costs of those inputs cannot be solely attributed to any one of them. […] In practice, when multi-product organisations such as hospitals calculate costs for particular products, they use accounting rules to share out the costs of all inputs and calculate average not marginal costs.”
“Studies of economies of scale in the health sector do not give a consistent and generalisable picture. […] studies of scope economies [also] do not show any consistent and generalisable picture. […] The impact of hospital ownership type on a range of key outcomes is generally ambiguous, with different studies yielding conflicting results. […] The association between hospital ownership and patient outcomes is unclear. The evidence is mixed and inconclusive regarding the impact of hospital ownership on access to care, morbidity, mortality, and adverse events.“
“Public goods are goods that are consumed jointly by all consumers. The strict economics definition of a public good is that they have two characteristics. The first is non-rivalry. This means that the consumption of a good or service by one person does not prevent anyone else from consuming it. Non-rival goods therefore have large marginal external benefits, which make them socially very desirable but privately unprofitable to provide. Examples of nonrival goods are street lighting and pavements. The second is non-excludability. This means that it is not possible to provide a good or service to one person without letting others also consume it. […] This may lead to a free-rider problem, in which people are unwilling to pay for goods and services that are of value to them. […] Note the distinction between public goods, which are goods and services that are non-rival and non-excludable, and publicly provided goods, which are goods or services that are provided by the government for any reason. […] Most health care products and services are not public goods because they are both rival and excludable. […] However, some health care, particularly public health programmes, does have public good properties.”
“[H]ealth care is typically consumed under conditions of uncertainty with respect to the timing of health care expenditure […] and the amount of expenditure on health care that is required […] The usual solution to such problems is insurance. […] Adverse selection exists when exactly the wrong people, from the point of view of the insurance provider, choose to buy insurance: those with high risks. […] Those who are most likely to buy health insurance are those who have a relatively high probability of becoming ill and maybe also incur greater costs than the average when they are ill. […] Adverse selection arises because of the asymmetry of information between insured and insurer. […] Two approaches are adopted to prevent adverse selection. The first is experience rating, where the insurance provider sets a different insurance premium for different risk groups. Those who apply for health insurance might be asked to undergo a medical examination and
to disclose any relevant facts concerning their risk status. […] There are two problems with this approach. First, the cost of acquiring the appropriate information may be high. […] Secondly, it might encourage insurance providers to ‘cherry pick’ people, only choosing to provide insurance to the low risk. This may mean that high-risk people are unable to obtain health insurance at all. […] The second approach is to make health insurance compulsory. […] The problem with this is that low-risk people effectively subsidise the health insurance payments of those with higher risks, which may be regarded […] as inequitable.”
“Health insurance changes the economic incentives facing both the consumers and the providers of health care. One manifestation of these changes is the existence of moral hazard. This is a phenomenon common to all forms of insurance. The suggestion is that when people are insured against risks and their consequences, they are less careful about minimising them. […] Moral hazard arises when it is possible to alter the probability of the insured event, […] or the size of the insured loss […] The extent of the problem depends on the price elasticity of demand […] Three main mechanisms can be used to reduce moral hazard. The first is co-insurance. Many insurance policies require that when an event occurs the insured shares the insured loss […] with the insurer. The co-insurance rate is the percentage of the insured loss that is paid by the insured. The co-payment is the amount that they pay. […] The second is deductibles. A deductible is an amount of money the insured pays when a claim is made irrespective of co-insurance. The insurer will not pay the insured loss unless the deductible is paid by the insured. […] The third is no-claims bonuses. These are payments made by insurers to discourage claims. They usually take the form of reduced insurance premiums in the next period. […] No-claims bonuses typically discourage insurance claims where the payout by the insurer is small. “
“The method of reimbursement relates to the way in which health care providers are paid for the services they provide. It is useful to distinguish between reimbursement methods, because they can affect the quantity and quality of health care. […] Retrospective reimbursement at full cost means that hospitals receive payment in full for all health care expenditures incurred in some pre-specified period of time. Reimbursement is retrospective in the sense that not only are hospitals paid after they have provided treatment, but also in that the size of the payment is determined after treatment is provided. […] Which model is used depends on whether hospitals are reimbursed for actual costs incurred, or on a fee-for-service (FFS) basis. […] Since hospital income [in these models] depends on the actual costs incurred (actual costs model) or on the volume of services provided (FFS model) there are few incentives to minimise costs. […] Prospective reimbursement implies that payments are agreed in advance and are not directly related to the actual costs incurred. […] incentives to reduce costs are greater, but payers may need to monitor the quality of care provided and access to services. If the hospital receives the same income regardless of quality, there is a financial incentive to provide low-quality care […] The problem from the point of view of the third-party payer is how best to monitor the activities of health care providers, and how to encourage them to act in a mutually beneficial way. This problem might be reduced if health care providers and third-party payers are linked in some way so that they share common goals. […] Integration between third-party payers and health care providers is a key feature of managed care.“
One of the prospective imbursement models applied today may be of particular interest to Danes, as the DRG system is a big part of the financial model of the Danish health care system – so I’ve added a few details about this type of system below:
“An example of prospectively set costs per case is the diagnostic-related groups (DRG) pricing scheme introduced into the Medicare system in the USA in 1984, and subsequently used in a number of other countries […] Under this scheme, DRG payments are based on average costs per case in each diagnostic group derived from a sample of hospitals. […] Predicted effects of the DRG pricing scheme are cost shifting, patient shifting and DRG creep. Cost shifting and patient shifting are ways of circumventing the cost-minimising effects of DRG pricing by shifting patients or some of the services provided to patients out of the DRG pricing scheme and into other parts of the system not covered by DRG pricing. For example, instead of being provided on an inpatient basis, treatment might be provided on an outpatient basis where it is reimbursed retrospectively. DRG creep arises when hospitals classify cases into DRGs that carry a higher payment, indicating that they are more complicated than they really are. This might arise, for instance, when cases have multiple diagnoses.”
Some stuff from the chapters dealing with the UK:
“we now know that reducing the HbA1c too far and fast in some patients can be harmful . This is a particularly important issue, where primary care is paid through the Quality Outcomes Framework (QoF), a general practice “pay for performance” programme . A major item within QoF, is the proportion of patients below HbA1c criteria: such reporting is not linked to rates of hypoglycaemia, ambulance call outs or hospitalisation, i.e., a practice could receive a high payment through achieving the QoF target, but with a high hospitalisation/ambulance callout rate.”
“nationwide audit data for England 2009–2010 showed that […] targets for HbA1c (≤7.5%/58.5 mmol/mol), blood pressure (BP) (<140/80 mmHg) and total cholesterol (<4.0 mmol/l) were achieved in only 67 %, 69% and 41 % of people with T2D.”
One thing that is perhaps worth noting here before moving any further is that the fact that you have actual data on this stuff is in itself indicative of an at least reasonable standard of care, compared to many places; in a lot of countries you just don’t have data on this kind of stuff, and it seems highly unlikely to me that the default assumption should be that things are going great in places where you do not have data on this kind of thing. Denmark also, incidentally, has a similar audit system, the results of which I’ve discussed in some detail before here on the blog).
“Our local audit data shows that approximately 85–90 % of patients with diabetes are managed by GPs and practice nurses in Coventry and Warwickshire. Only a small proportion of newly diagnosed patients with T2D (typically around 5–10 %) who attend the DESMOND (Diabetes Education and Self-Management for Ongoing and Newly Diagnosed) education programme come into contact with some aspect of the specialist services . […] Payment by results (PBR) has […] actively, albeit indirectly, disincentivised primary care to seek opinion from specialist services . […] Large volumes of data are collected by various services ranging between primary care, local laboratory facilities, ambulance services, hospital clinics (of varying specialties), retinal screening services and several allied healthcare professionals. However, the majority of these systems are not unified and therefore result in duplication of data collection and lack of data utilisation beyond the purpose of collection. This can result in missed opportunities, delayed communication, inability to use electronic solutions (prompts, alerts, algorithms etc.), inefficient use of resources and patient fatigue (repeated testing but no apparent benefit). Thus, in the majority of the regions in England, the delivery of diabetes care is disjointed and lacks integration. Each service collects and utilises data for their own “narrow” purpose, which could be used in a holistic way […] Potential consequences of the introduction of multiple service providers are fragmentation of care, reductions in continuity of care and propagation of a reluctance to refer on to a more specialist service . […] There are calls for more integration and less fragmentation in health-care , yet so far, the major integration projects in England have revealed negligible, if any, benefits [25, 32]. […] to provide high quality care and reduce the cost burden of diabetes, any integrated diabetes care models must prioritise prevention and early aggressive intervention over downstream interventions (secondary and tertiary prevention).”
“It is estimated that 99 % of diabetes care is self-management […] people with diabetes spend approximately only 3 h a year with healthcare professionals (versus 8757 h of self-management)” [this is a funny way of looking at things, which I’d never really considered before.]
“In a traditional model of diabetes care the rigid divide between primary and specialist care is exacerbated by the provision of funding. For example the tariff system used in England, to pay for activity in specialist care, can create incentives for one part of the system to “hold on” to patients who might be better treated elsewhere. This system was originally introduced to incentivise providers to increase elective activity and reduce waiting times. Whilst it has been effective for improving access to planned care, it is not so well suited to achieving the continuity of care needed to facilitate integrated care .”
“Currently in the UK there is a miss-match between what the healthcare policies require and what the workforce is actually being trained for. […] For true integrated care in diabetes and the other long term condition specialties to work, the education and training needs for both general practitioners and hospital specialists need to be more closely aligned.”
The chapter on Germany (Baden-Württemberg):
“An analysis of the Robert Koch-Institute (RKI) from 2012 shows that more than 50 % of German people over 65 years suffer from at least one chronic disease, approximately 50 % suffer from two to four chronic diseases, and over a quarter suffer from five or more diseases . […] Currently the public sector covers the majority (77 %) of health expenditures in Germany […] An estimated number of 56.3 million people are living with diabetes in Europe . […] The mean age of the T2DM-cohort [from Kinzigtal, Germany] in 2013 was 71.2 years and 53.5 % were women. In 2013 the top 5 co-morbidities of patients with T2DM were essential hypertension (78.3 %), dyslipidaemia (50.5 %), disorders of refraction and accommodation (38.2 %), back pain (33.8 %) and obesity (33.3 %). […] T2DM in Kinzigtal was associated with mean expenditure of 5,935.70 € per person in 2013 (not necessarily only for diabetes care ) including 40 % from inpatient stays, 24 % from drug prescriptions, 19 % from physician remuneration in ambulatory care and the rest from remedies and adjuvants (e.g., insulin pen systems, wheelchairs, physiotherapy, etc.), work incapacity or rehabilitation.”
“Zhang et al.  […] reported that globally, 12 % of health expenditures […] per person were spent on diabetes in 2010. The expenditure varies by region, age group, gender, and country’s income level.”
“Over the years many approaches [have been] introduced to improve the quality and continuity of care for chronic diseases. […] the Dutch minister of health approved, in 2007, the introduction of bundled-care (known is the Netherlands as a ‘chain-of-care’) approach for integrated chronic care, with special attention to diabetes. […] With a bundled payment approach – or episode-based payment – multiple providers are reimbursed a single sum of money for all services related to an episode of care (e.g., hospitalisation, including a period of post-acute care). This is in contrast to a reimbursement for each individual service (fee-for-service), and it is expected that this will reduce the volume of services provided and consequently lead to a reduction in spending. Since in a fee-for-service system the reimbursement is directly related to the volume of services provided, there is little incentive to reduce unnecessary care. The bundled payment approach promotes [in theory… – US] a more efficient use of services  […] As far as efficiency […] is concerned, after 3 years of evaluation, several changes in care processes have been observed, including task substitution from GPs to practice nurses and increased coordination of care [31, 36], thus improving process costs. However, Elissen et al.  concluded that the evidence relating to changes in process and outcome indicators, remains open to doubt, and only modest improvements were shown in most indicators. […] Overall, while the Dutch approach to integrated care, using a bundled payment system with a mixed payer approach, has created a limited improvement in integration, there is no evidence that the approach has reduced morbidity and premature mortality: and it has come at an increased cost.”
“In 2013 Sweden spent the equivalent of 4,904 USD per capita on health [OECD average: 3,453 USD], with 84 % of the expenditure coming from public sources [OECD average: 73 %]. […] Similarly high proportions [of public spending] can be found in the Netherlands (88 %), Norway (85 %) and Denmark (84 %) . […] Sweden’s quality registers, for tracking the quality of care that patients receive and the corresponding outcomes for several conditions, are among the most developed across the OECD . Yet, the coordination of care for patients with complex needs is less good. Only one in six patients had contact with a physician or specialist nurse after discharge from hospital for stroke, again with substantial variation across counties. Fewer than half of patients with type 1 diabetes […] have their blood pressure adequately controlled, with a considerable variation (from 26 % to 68 %) across counties . […] at 260 admissions per 100,000 people aged over 80, avoidable hospital admissions for uncontrolled diabetes in Sweden’s elderly population are the sixth highest in the OECD, and about 1.5 times higher than in Denmark.”
“Waiting times [in Sweden] have long been a cause of dissatisfaction . In an OECD ranking of 2011, Sweden was rated second worst . […] Sweden introduced a health-care guarantee in 2005 [guaranteeing fast access in some specific contexts]. […] Most patients who appeal under the health-care guarantee and [are] prioritised in the “queue” ha[ve] acute conditions rather than medical problems as a consequence of an underlying chronic disease. Patients waiting for a hip replacement or a cataract surgery are cured after surgery and no life-long follow-up is needed. When such patients are prioritised, the long-term care for patients with chronic diseases is “crowded out,” lowering their priority and risking worse outcomes. The health-care guarantee can therefore lead to longer intervals between checkups, with difficulties in accessing health care if their pre-existing condition has deteriorated.”
“Within each region / county council the care of patients with diabetes is divided. Patients with type 1 diabetes get their care at specialist clinics in hospitals and the majority of patients with type 2 diabetes in primary care . Patients with type 2 diabetes who have severe complications are referred to the Diabetes Clinics at the hospital. Approximately 10 % of all patients with type 2 continue their care at the hospital clinics. They are almost always on insulin in high doses often in combination with oral agents but despite massive medication many of these patients have difficulties to achieve metabolic balance. Patients with advanced complications such as foot ulcers, macroangiopathic manifestations and treatment with dialysis are also treated at the hospitals.”
Do keep in mind here that even if only 10% of type 2 patients are treated in a hospital setting, type 2 patients may still make up perhaps half or more of the diabetes patients treated in a hospital setting; type 2 prevalence is much, much higher than type 1 prevalence. Also, in view of such treatment- and referral patterns the default assumption when doing comparative subgroup analyses should always be that the outcomes of type 2 patients treated in a hospital setting should be expected to be much worse than the outcomes of type 2 patients treated in general practice; they’re in much poorer health than the diabetics treated in general practice, or they wouldn’t be treated in a hospital setting in the first place. A related point is that regardless of how great the hospitals are at treating the type 2 patients (maybe in some contexts there isn’t actually much of a difference in outcomes between these patients and type 2 patients treated in general practice, even though you’d expect there to be one?), that option will usually not be scalable. Also, it’s to be expected that these patients are more expensive than the default type 2 patient treated by his GP [and they definitely are: “Only if severe complications arise [in the context of a type 2 patient] is the care shifted to specialised clinics in hospitals. […] these patients have the most expensive care due to costly treatment of for example foot ulcers and renal insufficiency”]; again, they’re sicker and need more comprehensive care. They would need it even if they did not get it in a hospital setting, and there are costs associated with under-treatment as well.
“About 90 % of the children [with diabetes in Sweden] are classified as having Type 1 diabetes based on positive autoantibodies and a few percent receive a diagnosis of “Maturity Onset Diabetes of the Young” (MODY) . Type 2 diabetes among children is very rare in Sweden.”
Lastly, some observations from the final chapter:
“The paradox that we are dealing with is that in spite of health professionals wanting the best for their patients on a patient by patient basis, the way that individuals and institutions are organised and paid, directly influences the clinical decisions that are made. […] Naturally, optimising personal care and the provider/purchaser-commissioner budget may be aligned, but this is where diabetes poses substantial problems from a health system point of view: The majority of adverse diabetes outcomes […] are many years in the future, so a system based on this year’s budget will often not prioritise the future […] Even for these adverse “diabetes” outcomes, other clinical factors contribute to the end result. […] attribution to diabetes may not be so obvious to those seeking ways to minimise expenditure.”
[I incidentally tried to get this point across in a recent discussion on SSC, but I’m not actually sure the point was understood, presumably because I did not explain it sufficiently clearly or go into enough detail. It is my general impression, on a related note, that many people who would like to cut down on the sort of implicit public subsidization of unhealthy behaviours that most developed economies to some extent engage in these days do not understand well enough the sort of problems that e.g. the various attribution problems and how to optimize ‘post-diagnosis care’ (even if what you want to optimize is the cost minimization function…) cause in specific contexts. As I hope my comments indicate in that thread, I don’t think these sorts of issues can be ignored or dealt with in some very simple manner – and I’m tempted to say that if you think they can, you don’t know enough about these topics. I say that as one of those people who would like people who engage in risky behaviours to pay a larger (health) risk premium than they currently do].
[Continued from above, …problems from a health system point of view:]
“Payment for ambulatory diabetes care , which is essentially the preventative part of diabetes care, usually sits in a different budget to the inpatient budget where the big expenses are. […] good evidence for reducing hospitalisation through diabetes integrated care is limited […] There is ample evidence [11, 12] where clinicians own, and profit from, other services (e.g., laboratory, radiology), that referral rates are increased, often inappropriately […] Under the English NHS, the converse exists, where GPs, either holding health budgets, or receiving payments for maintaining health budgets , reduce their referrals to more specialist care. While this may be appropriate in many cases, it may result in delays and avoidance of referrals, even when specialist care is likely to be of benefit. [this would be the under-treatment I was talking about above…] […] There is a mantra that fragmentation of care and reductions in continuity of care are likely to harm the quality of care , but hard evidence is difficult to obtain.”
“The problems outlined above, suggest that any health system that fails to take account of the need to integrate the payment system from both an immediate and long term perspective, must be at greater risk of their diabetes integration attempts failing and/or being unsustainable. […] There are clearly a number of common factors and several that differ between successful and less successful models. […] Success in these models is usually described in terms of hospitalisation (including, e.g., DKA, amputation, cardiovascular disease events, hypoglycaemia, eye disease, renal disease, all cause), metabolic outcomes (e.g., HbA1c ), health costs and access to complex care. Some have described patient related outcomes, quality of life and other staff satisfaction, but the methodology and biases have often not been open to scrutiny. There are some methodological issues that suggest that many of those with positive results may be illusory and reflect the pre-existing landscape and/or wider changes, particular to that locality. […] The reported “success” of intermediate diabetes clinics run by English General Practitioners with a Special Interest led to extension of the model to other areas. This was finally tested in a randomised controlled trial […] and shown to be a more costly model with no real benefit for patients or the system. Similarly in East Cambs and Fenland, the 1 year results suggested major reductions in hospitalisation and costs in practices participating fully in the integrated care initiative, compared with those who “engaged” later . However, once the trends in neighbouring areas and among those without diabetes were accounted for, it became clear that the benefits originally reported were actually due to wider hospitalisation reductions, not just in those with diabetes. Studies of hospitalisation /hospital costs that do not compare with rates in the non-diabetic population need to be interpreted with caution.”
“Kaiser Permanente is often described as a great diabetes success story in the USA due to its higher than peer levels of, e.g., HbA1c testing . However, in the 2015 HEDIS data, levels of testing, metabolic control achieved and complication rates show quality metrics lower than the English NHS, in spite of the problems with the latter . Furthermore, HbA1c rates above 9 % remain at approximately 20 %, in Southern California  or 19 % in Northern California , a level much higher than that in the UK […] Similarly, the Super Six model […] has been lauded as a success, as a result of reductions in patients with, e.g., amputations. However, these complications were in the bottom quartile of performance for these outcomes in England  and hence improvement would be expected with the additional diabetes resources invested into the area. Amputation rates remain higher than the national average […] Studies showing improvement from a low baseline do not necessarily provide a best practice model, but perhaps a change from a system that required improvement. […] Several projects report improvements in HbA1c […] improvements in HbA1c, without reports of hypoglycaemia rates and weight gain, may be associated with worse outcomes as suggested from the ACCORD trial .”
The book provides a good overview of studies and clinical trials which have attempted to improve the coordination of diabetes treatment in specific areas. The book covers research from all over the world – the UK, the US, Hong Kong, South Africa, Germany, Netherlands, Sweden, Australia. The language of the publication is quite good, considering the number of non-native English speaking contributors. An at least basic understanding of medical statistics is probably required for one to properly read and understand this book in full.
The book is quite good if you want to understand how people have tried to improve (mainly type 2) diabetes treatment ‘from an organizational point of view’ (the main focus here is not on new treatment options, but on how to optimize care delivery and make the various care providers involved work better together, in a way that improves outcomes for patients (at an acceptable cost?), which is to a large extent an organizational problem), but it’s actually also probably quite a nice book if you simply want to know more about how diabetes treatment systems differ across countries; the contributors don’t assume that the readers know how e.g. the Swedish approach to diabetes care differs from that of e.g. Pennsylvania, so many chapters contain interesting details on how specific countries/health care providers handle specific aspects of e.g. care delivery or finance.
What people mean by ‘integrated care’ varies a bit depending on whom you ask (patients and service providers may emphasize different dimensions when thinking about these topics), as should also be clear from the quotes below; however I assumed it might be a good idea to start out the post with the quote above, so that people who might have no idea what ‘integrated diabetes care’ is did not start out reading the post completely in the dark. In short, a big problem in health service delivery contexts is that care provision is often fragmented and uncoordinated, for many reasons. Ideally you might like doctors working in general practice to collaborate smoothly and efficiently with hospital staff and various other specialists involved in diabetes care (…and perhaps also with social services and mental health care providers…), but that kind of coordination often doesn’t happen, leading to what may well be sub-optimal care provision. Collaboration and a ‘desirable’ (whatever that might mean) level of coordination between service providers doesn’t happen automatically; it takes money, effort and a lot of other things (that the book covers in some detail…) to make it happen – and so often it doesn’t happen, at least there’s a lot of room for improvement even in places where things work comparatively well. Some quotes from the book on these topics:
“it is clear that in general, wherever you are in the world, service delivery is now fragmented . Such fragmentation is a manifestation of organisational and financial barriers, which divide providers at the boundaries of primary and secondary care, physical and mental health care, and between health and social care. Diverse specific organisational and professional cultures, and differences in terms of governance and accountability also contribute to this fragmentation . […] Many of these deficiencies are caused by organisational problems (barriers, silo thinking, accountability for budgets) and are often to the detriment of all of those involved: patients, providers and funders – in extreme cases – leading to lose-lose-lose-situations […] There is some evidence that integrated care does improve the quality of patient care and leads to improved health or patient satisfaction [10, 11], but evidence of economic benefits remain an issue for further research . Failure to improve integration and coordination of services along a “care continuum” can result in suboptimal outcomes (health and cost), such as potentially preventable hospitalisation, avoidable death, medication errors and adverse drug events [3, 12, 13].”
“Integrated care is often described as a continuum [10, 24], actually depicting the degree of integration. This degree can range from linkage, to coordination and integration , or segregation (absence of any cooperation) to full integration , in which the integrated organisation is responsible for the full continuum of care responsible for the full continuum of care […] this classification of integration degree can be expanded by introducing a second dimension, i.e., the user needs. User need should be defined by criteria, like stability and severity of condition, duration of illness (chronic condition), service needed and capacity for self-direction (autonomy). Accordingly, a low level of need will not require a fully integrated system, then [10, 24] […] Kaiser Permanente is a good example of what has been described as a “fully integrated system. […] A key element of Kaiser Permanente’s approach to chronic care is the categorisation of their chronically ill patients into three groups based on their degree of need“.
It may be a useful simplification to think along the lines of: ‘Higher degree of need = a higher level of integration becomes desirable/necessary. Disease complexity is closely related to degree of need.’ Some related observations from the book:
“Diabetes is a condition in which longstanding hyperglycaemia damages arteries (causing macrovascular, e.g., ischaemic heart, peripheral and cerebrovascular disease, and microvascular disease, e.g., retinopathy, nephropathy), peripheral nerves (causing neuropathy), and other structures such as skin (causing cheiroarthropathy) and the lens (causing cataracts). Different degrees of macrovascular, neuropathic and cutaneous complications lead to the “diabetic foot.” A proportion of patients, particularly with type 2 diabetes have metabolic syndrome including central adiposity, dyslipidaemia, hypertension and non alcoholic fatty liver disease. Glucose management can have severe side effects, particularly hypoglycaemia and weight gain. Under-treatment is not only associated with long term complications but infections, vascular events and increased hospitalisation. Absence of treatment in type 1 diabetes can rapidly lead to diabetic keto-acidosis and death. Diabetes doubles the risk for depression, and on the other hand, depression may increase the risk for hyperglycaemia and finally for complications of diabetes . Essentially, diabetes affects every part of the body once complications set in, and the crux of diabetes management is to normalise (as much as possible) the blood glucose and manage any associated risk factors, thereby preventing complications and maintaining the highest quality of life. […] glucose management requires minute by minute, day by day management addressing the complexity of diabetes, including clinical and behavioural issues. While other conditions also have the patient as therapist, diabetes requires a fully empowered patient with all of the skills, knowledge and motivation every hour of the waking day. A patient that is fully engaged in self-management, and has support systems, is empowered to manage their diabetes and will likely experience better outcomes compared with those who do not have access to this support. […] in diabetes, the boundaries between primary care and secondary care are blurred. Diabetes specialist services, although secondary care, can provide primary care, and there are GPs, diabetes educators, and other ancillary providers who can provide a level of specialist care.”
In short, diabetes is a complex disease – it’s one of those diseases where a significant degree of care integration is likely to be necessary in order to achieve even close to optimal outcomes. A little more on these topics:
“The unique challenge to providers is to satisfy two specific demands in diabetes care. The first is to anticipate and recognize the onset of complications through comprehensive diabetes care, which demands meticulous attention to a large number of process-of-care measures at each visit. The second, arguably greater challenge for providers is to forestall the development of complications through effective diabetes care, which demands mastery over many different skills in a variety of distinct fields in order to achieve performance goals covering multiple facets of management. Individually and collectively, these dual challenges constitute a virtually unsustainable burden for providers. That is because (a) completing all the mandated process measures for comprehensive care requires far more time than is traditionally available in a single patient visit; and (b) most providers do not themselves possess skills in all the ancillary disciplines essential for effective care […] Diabetes presents patients with similarly unique dual challenges in mastering diabetes self-management with self-awareness, self-empowerment and self-confidence. Comprehensive Diabetes Self-Management demands the acquisition of a variety of skills in order to fulfil a multitude of tasks in many different areas of daily life. Effective Diabetes Self-Management, on the other hand, demands constant vigilance, consistent discipline and persistent attention over a lifetime, without respite, to nutritional self-discipline, monitoring blood glucose levels, and adherence to anti-diabetic medication use. Together, they constitute a burden that most patients find difficult to sustain even with expert assistance, and all-but-impossible without it.”
“Care coordination achieves critical importance for diabetes, in particular, because of the need for management at many different levels and locations. At the most basic level, the symptomatic management of acute hypo- and hyperglycaemia often devolves to the PCP [primary care provider], even when a specialist oversees more advanced strategies for glycaemic management. At another level, the wide variety of chronic complications requires input from many different specialists, whereas hospitalizations for acute emergencies often fall to hospitalists and critical care specialists. Thus, diabetes care is fraught with the potential for sometimes conflicting, even contradictory management strategies, making care coordination mandatory for success.”
“Many of the problems surrounding the provision of adequate person-centred care for those with diabetes revolve around the pressures of clinical practice and a lack of time. Good diabetes management requires attention to a number of clinical parameters
1. (Near) Normalization of blood glucose
2. Control of co-morbidities and risk factors
3. Attainment of normal growth and development
4. Prevention of Acute Complications
5. Screening for Chronic Complications
To fit all this and a holistic, patient-centred collaborative approach into a busy general practice, the servicing doctor and other team members must understand that diabetes cannot be “dealt with” coincidently during a patient consultation for an acute condition.”
“Implementation of the team model requires sharing of tasks and responsibilities that have traditionally been the purview of the physician. The term “team care” has traditionally been used to indicate a group of health-care professionals such as physicians, nurses, pharmacists, or social workers, who work together in caring for a group of patients. In a 2006 systematic review of 66 trials testing 11 strategies for improving glycaemic control for patients with diabetes, only team care and case management showed a significant impact on reducing HbA1c levels .”
Moving on, I found the chapter about Hong Kong interesting, for several reasons. The quality of Scandinavian health registries are probably widely known in the epidemiological community, but I was not aware of Hong Kong’s quality of diabetes data, and data management strategies, which seems to be high. Nor was I aware of some of the things they’ve discovered while analyzing those data. A few quotes from that part of the coverage:
“Given the volume of patients in the clinics, the team’s earliest work from the HKDR [Hong Kong Diabetes Registry, US] prioritized the development of prediction models, to allow for more efficient, data-driven risk stratification of patients. After accruing data for a decade on over 7000 patients, the team established 5-year probabilities for major diabetes-related complications as defined by the International Code for Diseases retrieved from the CMS [Clinical Management System, US]. These included end stage renal disease , stroke , coronary heart disease , heart failure , and mortality . These risk equations have a 70–90 % sensitivity and specificity of predicting outcomes based on the parameters collected in the registry.”
“The lifelong commitments to medication adherence and lifestyle modification make diabetes self-management both physically and emotionally taxing. The psychological burdens result from insulin injection, self-monitoring of blood glucose, dietary restriction, as well as fear of complications, which may significantly increase negative emotions in patients with diabetes. Depression, anxiety, and distress are prevalent mental afflictions found in patients with diabetes […] the prevalence of depression was 18.3 % in Hong Kong Chinese patients with type 2 diabetes. Furthermore, depression was associated with poor glycaemic control and self-reported hypoglycaemia, in part due to poor adherence […] a prospective study involving 7835 patients with type 2 diabetes without cardiovascular disease (CVD) at baseline […] found that [a]fter adjusting for conventional risk factors, depression was independently associated with a two to threefold increase in the risk of incident CVD .”
“Diabetes has been associated with increased cancer risk, but the underlying mechanism is poorly understood. The linkage between the longitudinal clinical data within the HKDR and the cancer outcome data in the CMS has provided important observational findings to help elucidate these connections. Detailed pharmacoepidemiological analyses revealed attenuated cancer risk in patients treated with insulin and oral anti-diabetic drugs compared with non-users of these drugs”
“Among the many challenges of patient self-management, lack of education and empowerment are the two most cited barriers . Sufficient knowledge is unquestionably important in self-care, especially in people with low health literacy and limited access to diabetes education. Several systematic reviews [have] showed that self-management education with comprehensive lifestyle interventions improved glycaemic and cardiovascular risk factor control [60–62].”
“Clinical trials are expensive because of the detail and depth of data required on each patient, which often require separate databases to be developed outside of the usual-care electronic medical records or paper-based chart systems. These databases must be built, managed, and maintained from scratch every time, often requiring double-entry of data by research staff. The JADE [Joint Asia Diabetes Evaluation] programme provides a more efficient means of collecting the key clinical variables in its comprehensive assessments, and allows researchers to add new fields as necessary for research purposes. This obviates the need for redundant entry into non-clinical systems, as the JADE programme is simultaneously a clinical care tool and prospective database. […] A large number of trials fail because of inadequate recruitment . The JADE programme has allowed for ready identification of eligible clinical trial participants because of its detailed clinical database. […] One of the greatest challenges in clinical trials is maintaining the contact between researchers and patients over many years. […] JADE facilitates long-term contact with the patient, as part of routine periodic follow-up. This also allows researchers to evaluate longer term outcomes than many previous trials, given the great expense in maintaining databases for the tracking of longitudinal outcomes.”
Lastly, some stuff on cost and related matters from the book:
“Diabetes imposes a massive economic burden on all healthcare systems, accounting for 11 % of total global healthcare expenditure on adults in 2013.”
“Often, designated service providers institute managed care programmes to standardize and control care rendered in a safe and cost-effective manner. However, many of these programmes concentrate on cost-savings rather than patient service utilization and improved clinical outcomes. [this part of the coverage is from South Africa, but these kinds of approaches are definitely not limited to SA – US] […] While these approaches may save some costs in the short-term, Managed Care Programmes which do not address patient outcomes nor reduce long term complications, ignore the fact that that the majority of the costs for treating diabetes, even in the medium term, are due to the treatment of acute and chronic complications and for inpatient hospital care . Additionally, it is well established that poor long-term clinical outcomes increase the cost burden of managing the patient with diabetes by up to 250 %. […] overall, the costs of medication, including insulin, accounts for just 7 % of all healthcare costs related to diabetes [this number varies across countries, I’ve seen estimates of 15% in the past – and as does the out-pocket share of that cost – but the costs of medications constitute a relatively small proportion of the total costs of diabetes everywhere you look, regardless of health care system and prevalence. If you include indirect costs as well, which you should, this becomes even more obvious – US]”
“[A] study of the Economic Costs of Diabetes in the U.S. in 2012  showed that for people with diabetes, hospital inpatient care accounted for 43 % of the total medical cost of diabetes.”
“There is some evidence of a positive impact of integrated care programmes on the quality of patient care [10, 34]. There is also a cautious appraisal that warns that “Even in well-performing care groups, it is likely to take years before cost savings become visible” […]. Based on a literature review from 1996 to 2004 Ouwens et al.  found out that integrated care programmes seemed to have positive effects on the quality of care. […] because of the variation in definitions of integrated care programmes and the components used cover a broad spectrum, the results should be interpreted with caution. […] In their systematic review of the effectiveness of integrated care Ouwens et al.  could report on only seven (about 54 %) reviews which had included an economic analysis. Four of them showed financial advantages. In their study Powell Davies et al.  found that less than 20 % of studies that measured economic outcomes found a significant positive result. Similarly, de Bruin et al.  evaluated the impact of disease management programmes on health-care expenditures for patients with diabetes, depression, heart failure or chronic obstructive pulmonary disease (COPD). Thirteen studies of 21 showed cost savings, but the results were not statistically significant, or not actually tested for significance. […] well-designed economic evaluation studies of integrated care approaches are needed, in particular in order to support decision-making on the long-term financing of these programmes [30, 39]. Savings from integrated care are only a “hope” as long as there is no carefully designed economic analysis with a kind of full-cost accounting.”
“The cost-effectiveness of integrated care for patients with diabetes depends on the model of integrated care used, the system in which it is used, and the time-horizon chosen . Models of cost benefit for using health coaching interventions for patients with poorly controlled diabetes have generally found a benefit in reducing HbA1c levels, but at the cost of paying for the added cost of health coaching which is not offset in the short term by savings from emergency department visits and hospitalizations […] An important question in assessing the cost of integrated care is whether it needs to be cost-saving or cost-neutral to be adopted, or is it enough to increase quality-adjusted life years (QALYs) at a “reasonable” cost (usually pegged at between $30,000 and $60,000 per QALY saved). Most integrated care programmes for patients with diabetes that have been evaluated for cost-effectiveness would meet this more liberal criterion […] In practice, integrated care programmes for patients with diabetes are often part of generalized programmes of care for patients with other chronic medical conditions, making the allocation of costs and savings with respect to integrated care for diabetes difficult to estimate. At this point, integrated care for patients with diabetes appears to be a widely accepted goal. The question becomes: which model of integrated care is most effective at reasonable cost? Answering this question depends both on what costs are included and what outcomes are measured; the answers may vary among different patient populations and different care systems.”
I have had a look at two sources, the Office of Refugee Resettlement’s annual reports to Congress for the financial years 2013 and 2014. I have posted some data from the reports below. In the cases where the page numbers are not included directly in the screen-caps, all page numbers given below are the page numbers of the pdf version of the documents.
I had some trouble with how to deal with the images included in the post; I hope it looks okay now, at least it does on my laptop – but if it doesn’t, I’m not sure I care enough to try to figure out how to resolve the problem. Anyway, to the data!
The one above is the only figure/chart from the 2014 report, but I figured it was worth including here. It’s from page 98 of the report. It’s of some note that, despite the recent drop, 42.8% of the 2014 US arrivals worked/had worked during the year they arrived; in comparison, only 494 of Sweden’s roughly 163.000 asylum seekers who arrived during the year 2015 landed a job that year (link).
All further images/charts below are from the 2013 report.
It’s noteworthy here how different the US employment gap is to e.g. the employment gap in Denmark. In Denmark the employment rate of refugees with fugitive status who have stayed in the country for 5 years is 34%, and the employment rate of refugees with fugitive status who have stayed in the country for 15 years is 37%, compared to a native employment rate of ~74% (link). But just like in Denmark, in the US it matters a great deal where the refugees are coming from:
“Since their arrival in the U.S., 59 percent of refugees in the five-year population worked at one point. This rate was highest for refugees from Latin America (85 percent) and lowest for refugees from the Middle East (48 percent), while refugees from South/Southeast Asia (61 percent) and Africa (59 percent) were positioned in between. […] The highest disparity between male and female labor force participation rates was found for respondents from the Middle East (64.1 percent for males vs. 34.5 percent for females, a gap of 30 points). A sizeable gender gap was also found among refugees from South/Southeast Asia (24 percentage points) and Africa (18 percentage points), but there was hardly any gap among Latin American refugees (3 percentage points). Among all refugee groups, 71 percent of males were working or looking for work at the time of the 2013 survey, compared with 49 percent of females.” (p.94)
Two tables (both are from page 103 of the 2013 report):
When judged by variables such as home ownership and the proportion of people who survive on public assistance, people who have stayed longer do better (Table II-16). But if you consider table II-17, a much larger proportion of the refugees surveyed in 2013 than in 2008 are partially dependent on public assistance, and it seems that a substantially smaller proportion of the refugees living in the US in the year 2013 was totally self-reliant than was the case 5 years earlier. Fortunately the 2013 report has a bit more data on this stuff (p. 107):
The table has more information on page 108, with more details about specific public assistance programs.Table II-22 includes data on how public assistance utilization has developed over time (it’s clear that utilization rates increased substantially during the half-decade observed):
Some related comments from the report:
“Use of non-cash assistance was generally higher than cash assistance. This is probably because Medicaid, the Supplemental Nutrition Assistance Program (SNAP), and housing assistance programs, though available to cash assistance households, also are available more broadly to households without children. SNAP utilization was lowest among Latin Americans (37 percent) but much higher for the other groups, reaching 89 to 91 percent among the refugees from Africa and the Middle East. […] Housing assistance varied by refugee group — as low as 4 percent for Latin American refugees and as high as 32 percent for refugees from South/Southeast Asia in the 2013 survey. In the same period, other refugee groups averaged use of housing assistance between 19 and 31 percent.” (pp. 107-108)
The report includes some specific data on Iraqi refugees – here’s one table from that section:
The employment rate of the Iraqis increased from 29.8% in the 2009 survey to 41.3% in 2013. However the US female employment rate is still actually not much different from the female employment rates you observe when you look at European data on these topics – just 29%, up from 18.8% in 2009. As a comparison, in the year 2010 the employment rate of Iraqi females living in Denmark was 28% (n=10163) (data from p.55 of the Statistics Denmark publication Indvandrere i Danmark 2011), almost exactly the same as the employment rate of female Iraqis in the US.
Of note in the context of the US data is perhaps also the fact that despite the employment rate going up for females in the time period observed, the labour market participation rate of this group actually decreased between 2009 and 2013, as it went from 42.2% to 38.1%. So more than 3 out of 5 Iraqi female refugees living in the US are outside the labour market, and almost one in four of those that are not are unemployed. A few observations from the report:
“The survey found that the overall EPR [employment rate, US] for the 2007 to 2009 Iraqi refugee group in the 2013 survey9 was 41 percent (55 percent for males and 29 percent for females), a steady increase in the overall rate from 39 percent in the 2012 survey, 36 percent in the 2011 survey, 31 percent in the 2010 survey, and 30 percent in the 2009 survey. As a point of further reference, the EPR for the general U.S. population was 58.5 percent in 2013, about 17 percentage points higher than that of the 2007 to 2009 Iraqi refugee group (41.3 percent). The U.S. male population EPR was nine percentage points higher than the rate for Iraqi males who arrived in the U.S. in 2007 to 2009 (64 percent versus 55 percent), while the rate for the Iraqi females who arrived in the U.S. in 2007 to 2009 was 24 points higher for all U.S. women (53 percent versus 29 percent). The difference between the male and female EPRs among the same group of Iraqi refugees (26 percentage points) also was much larger than the gap between male and female EPRs in the general U.S. population (11 points) […] The overall unemployment rate for the 2007 to 2009 Iraqi refugee group was 22.9 percent in the 2013 survey, about four times higher than that of the general U.S. population (6.5 percent) in 2013” (pp. 114-115).
I was debating whether to blog this book at all, as it’s neither very long nor very good, but I decided it was worth adding a few observations from the book here. You can read my goodreads review of the publication here. Whenever quotes look a bit funny in the coverage below (i.e. when you see things like words in brackets or strangely located ‘[…]’, assume that the reason for this is that I tried to improve upon the occasionally frankly horrible language of some of the contributors to the publication. If you want to know exactly what they wrote, rather than what they presumably meant to write (basic grammar errors due to the authors having trouble with the English language are everywhere in this publication, and although I did choose to do so here I do feel a bit uncomfortable quoting a publication like this one verbatim on my blog), read the book.
I went off on a tangent towards the end of the post and I ended up adding some general remarks about medical cost, insurance and various other topics. So the post may have something of interest even to people who may not be highly interested in any of the stuff covered in the book itself.
“Despite intensive recommendations, [the] influenza vaccination rate in medical staff in Poland ranges from about 20 % in physicians to 10 % in nurses. […] It has been demonstrated that vaccination of health care workers against influenza significantly decreases mortality of elderly people remaining under [long-term care]. […] Vaccinating health care workers also substantially reduces sickness absenteeism, especially in emergency units […] Concerning physicians, vaccination avoidance stemmed from the lack of knowledge of protective value of vaccine (33 %), lack of time to get vaccinated (29 %), and Laziness (24 %). In nurses, these figures amounted to 55 %, 12 %, and 5 %, respectively (Zielonka et al. 2009).”
I just loved the fact that ‘laziness’ was included here as an explanatory variable, but on the other hand the fact that one-third of doctors cited lack of knowledge about the protective value of vaccination as a reason for not getting vaccinated is … well, let’s use the word ‘interesting’. But it gets even better:
“The questions asked and opinions expressed by physicians or nurses on vaccinations showed that their knowledge in this area was far from the current evidence-based medicine recommendations. Nurses, in particular, commonly presented opinions similar to those which can be found in anti-vaccination movements and forums […] The attitude of physicians toward influenza vaccination vary greatly. In many a ward, a majority of physicians were vaccinated (70–80 %). However, in the neurology and intensive care units the proportion of vaccinated physicians amounted only to 20 %. The reason for such a small yield […] was a critical opinion about the effectiveness and safety of vaccination. Similar differences, depending on medical specialty, were observed in Germany (4–71% of vaccines) (Roggendorf et al. 2011) […] It is difficult to explain the fear of influenza vaccination among the staff of intensive care units, since these are exactly the units where many patients with most severe cases of influenza are admitted and often die (Ayscue et al. 2014). In this group of health care workers, high efficiency of influenza vaccination has been clearly demonstrated […] In the present study a strong difference between the proportion of vaccinated physicians (55 %) and nurses (21 %) was demonstrated, which is in line with some data coming from other countries. In the US, 69 % of physicians and 46 % of nurses get a vaccine shot […] and in Germany the respective percentages are 39 % and 17 % […] In China, 21 % of nurses and only 13 % of physicians are vaccinated against influenza (Seale et al. 2010a), and in [South] Korea, 91 % and 68 % respectively (Lee et al. 2008).”
“[A] survey was conducted among Polish (243) and foreign (80) medical students at the Pomeranian Medical University in Szczecin, Poland. […] The survey results reveal that about 40 % of students were regular or occasional smoker[s]. […] 60 % of students declared themselves to be non-smokers, 20 % were occasional smokers, and 20 % were regular smokers”
40 % of medical students in a rather large sample turned out to be smokers. Wow. Yeah, I hadn’t seen that one coming. I’d probably expect a few alcoholics and I would probably not have been surprised about a hypothetical higher-than-average alcohol consumption in a sample like that (they don’t talk about alcohol so I don’t have data on this, I’m just saying I wouldn’t be surprised – after all I do know that doctors are high-risk for suicide), but such a large proportion smoking? That’s unexpected. It probably shouldn’t have been, considering that this is very much in line with the coverage included in Thirlaway & Upton’s book. I include some remarks about their coverage about smoking in my third post about the book here. The important observation of note from that part of the book’s coverage is probably that most smokers want to quit and yet very few manage to actually do it. “Although the majority of smokers want to stop smoking and predict that they will have stopped in twelve months, only 2–3 per cent actually stops permanently a year (Taylor et al. 2006).” If those future Polish doctors know that smoking is bad for them, but they assume that they can just ‘stop in time’ when ‘the time’ comes – well, some of those people are probably in for a nasty surprise (and they should have studied some more, so that they’d known this?).
“A prospective study of middle-aged British men […] revealed that the self-assessment of health status was strongly associated with mortality. Men who reported poor health had an eight-fold increase in total mortality compared with those reporting excellent health. Those who assessed their health as poor were manual workers, cigarette smokers, and often heavy drinkers. Half of those with poor health suffered from chest pain on exertion and other chronic diseases. Thus, self-assessment of health status appears to be a good measure of current physical health and risk of death“.
“It is estimated that globally 3.1 million people die each year due to chronic obstructive pulmonary disease (COPD). According to the World Health Organization (WHO 2014), the disease was the third leading cause of death worldwide in 2012. [In the next chapter of the book they state that: “COPD is currently the fourth leading cause of death among adult patients globally, and it is projected that it will be the third most common cause of death by 2020.” Whether it’s the third or fourth most common cause of death, it definitely kills a lot of people…] […] Approximately 40–50 % of lifelong smokers will go on to develop COPD […] the number of patients with a primary diagnosis of COPD […] constitutes […] 1.33 % of the total population of Poland. This result is consistent with that obtained during the Polish Spirometry Day in 2011 (Dabrowiecki et al. 2013) when 1.1 % of respondents declared having had a diagnosed COPD, while pulmonary function tests showed objectively the presence of obstruction in 12.3 % of patients.”
Based on numbers like these I feel tempted to conclude that the lungs may be yet another organ in which a substantial proportion of people of advanced age experience low-level organ dysfunction arguably not severe enough to lead to medical intervention. The kidneys are similar, as I also noted when I covered Longmore et al.‘s text.
“Generally, the costs of treatment of patients with COPD are highly variable […] estimates suggest […] that the costs of treatment of moderate stages of COPD may be 3–4-fold higher in comparison with the mild form of the disease, and in the severe form they reach up to 6–10 times the basic cost […] every second person with COPD is of working age […] Admission rates for COPD patients differ as much as 10-fold between European countries (European Lung White Book 2013).”
“In the EU, the costs of respiratory diseases are estimated at 6 % of the budget allocated to health care. Of this amount, 56 % is allocated for the treatment of COPD patients. […] Studies show that one per ten Poles over 30 year of age have COPD symptoms. Each year, around 4 % of all hospitalizations are due to COPD. […] One of the most important parameters regarding pharmacoeconomics is the hospitalization rate […] a high number of hospitalizations due to COPD exacerbations in Poland dramatically increase direct medical costs.”
I bolded the quote above because I knew this but had never seen it stated quite as clearly as it’s stated here, and I may be tempted to quote that one later on. Hospitalizations are often really expensive compared to drugs people who are not hospitalized take for their various health conditions, for example you can probably buy a year’s worth of anti-diabetic drugs, or more, for the costs of just one hospital admission due to drug mis-dosing. Before you get the idea that this might have ‘obvious implications’ for how ‘one’ should structure medical insurance arrangements in terms of copay structures etc., do however keep in mind that the picture here is really confusing:
Here’s the link, with more details – the key observation is that: “There is no consistency […] in the direction of change in costs resulting from changes in compliance”. That’s not diabetes, that’s ‘stuff in general’.
It would be neat if you could e.g. tell a story about how high costs of a drug always lead to non-compliance, which lead to increased hospitalization rates, which lead to higher costs than if the drugs had been subsidized. That would be a very strong case for subsidization. Or it would be neat if you could say that it doesn’t matter whether you subsidize a drug or not, because the costs of drugs are irrelevant in terms of usage patterns – people are told to take one pill every day by their doctor, and by golly that’s what they’re doing, regardless of what those pills cost. I know someone personally who wrote a PhD thesis about a drug where that clearly wasn’t the case, and the price elasticity was supposed to be ‘theoretically low’ in that case, so that one’s obviously out ‘in general’, but the point is that people have looked at this stuff, a lot. I’m assuming you might be able to spot a dynamic like this in some situations, and different dynamics in the case of other drugs. It gets even better when you include complicating phenomena like cost-switching; perhaps the guy/organization responsible for potentially subsidizing the drug is not the same guy(/-…) as the guy who’s supposed to pay for the medical admissions (this depends on the insurance structure/setup). But that’s not always the case, and the decision as to who pays for what is not necessarily a given; it may depend e.g. on health care provider preferences, and those preferences may themselves depend upon a lot of things unrelated to patient preferences or -incentives. A big question even in the relatively simple situation where the financial structure is – for these purposes at least – simple, is also the extent to which relevant costs are even measured, and/or how they’re measured (if a guy dies due to a binding budget constraint resulting in no treatment for a health condition that would have been treatable with a drug, is that outcome supposed to be ‘very cheap’ (he didn’t pay anything for drugs, so there were no medical outlays) or very expensive (he could have worked for another two decades if he’d been treated, and those productivity losses need to be included in the calculation somehow; to focus solely on medical outlays is thus to miss the point)? An important analytical point here is that if you don’t explicitly make those deaths/productivity losses expensive, they are going to look very cheap, because the default option will always be to have them go unrecorded and untallied.
A problem not discussed in the coverage was incidentally the extent to which survey results pertaining to the cost of vaccination are worth much. You ask doctors why they didn’t get vaccinated, and they tell you it’s because it’s too expensive. Well, how many of them would you have expected to tell you they did not get vaccinated because the vaccines were too cheap? This is more about providing people with a perceived socially acceptable out than it is about finding stuff out about their actual reasons for behaving the way they do. If the price of vaccination does not vary across communities it’s difficult to estimate the price elasticity, true (if it does, you probably got an elasticity estimate right there), but using survey information to implicitly assess the extent to which the price is too high? Allow the vaccination price to vary next year/change it/etc. (or even simpler/cheaper, if those data exist; look at price variation which happened in the past and observe how the demand varied), and see if/how the doctors and nurses respond. That’s how you do this, you don’t ask people. Asking people is also actually sort of risky; I’m pretty sure a smart doctor could make an argument that if you want doctors to get vaccinated you should pay them for getting the shot – after all, getting vaccinated is unpleasant, and as mentioned there are positive externalities here in terms of improved patient outcomes, which might translate into specific patients not dying, which is probably a big deal, for those patients at least. The smart doctor wouldn’t necessarily be wrong; if the price of vaccination was ‘sufficiently low’, i.e. a ‘large’ negative number (‘if you get vaccinated, we give you $10.000’), I’m pretty sure coverage rates would go up a lot. That doesn’t make it a good idea. (Or a bad idea per se, for that matter – it depends upon the shape of the implicit social welfare function we’re playing around with. Though I must add – so that any smart doctors potentially reading along here don’t get any ideas – that a ‘large’ negative price of vaccination for health care workers is a bad idea if a cheaper option which achieves the same outcome is potentially available to the decision makers in question, which seems highly likely to me. For example vaccination rates of medical staff would also go up a lot if regular vaccinations were made an explicit condition of their employment, the refusal of which would lead to termination of their employment… There would be implicit costs of such a scheme, in terms of staff selection effects, but if you’re comparing solely those options and you’re the guy who makes the financial decisions..?)
As I stated in my goodreads review, ‘If you’re a schizophrenic and/or you have a strong interest in e.g. the metabolic effects of various anti-psychotics, the book is a must-read’. If that’s not true, it’s a different matter. One reason why I didn’t give the book a higher rating is that many of the numbers in there are quite dated, which is a bit annoying because it means you might feel somewhat uncertain about how valid the estimates included still are at this point.
As pointed out in my coverage of the human drug metabolism text there are a lot of things that can influence the way that drugs are metabolized, and this text includes some details about a specific topic which may help to illustrate what I meant by stating in that post that people ‘self-experimenting’ may be taking on risks they may not be aware of. Now, diabetics who need insulin injections are taking a drug with a narrow therapeutic index, meaning that even small deviations from the optimal dose may have serious repercussions. A lot of things influence what is actually the optimal dose in a specific setting; food (“food is like a drug to a person with diabetes”, as pointed out in Matthew Neal’s endocrinology text, which is yet another text I, alas, have yet to cover here), sleep patterns, exercise (sometimes there may be an impact even days after you’ve exercised), stress, etc. all play a role, and even well-educated diabetics may not know all the details.
A lot of drugs also affect glucose metabolism and insulin sensitivity, one of the best known drug types of this nature probably being the corticosteroids because of their widespread use in a variety of disorders, including autoimmune disorders which tend to be more common in autoimmune forms of diabetes (mainly type 1). However many other types of drugs can also influence blood glucose, and on the topic of antidepressants and antipsychotics we actually know some stuff about these things and about how various medications influence glucose levels; it’s not a big coincidence that people have looked at this, they’ve done that because it has become clear that “[m]any medications, in particular psychotropics, including antidepressants, antipsychotics, and mood stabilizers, are associated with elevations in blood pressure, weight gain, dyslipidemias, and/or impaired glucose homeostasis.” (p. 49). Which may translate into an increased risk of type 2 diabetes, and impaired glucose control in diabetics. Incidentally the authors of this text observes in the text that: “Our research group was among the first in the field to identify a possible link between the development of obesity, diabetes, and other metabolic derangements (e.g., lipid abnormalities) and the use of newer, second-generation antipsychotic medications.” Did the people who took these drugs before this research was done/completed know that their medications might increase their risk of developing diabetes? No, because the people prescribing it didn’t know, nor did the people who developed the drugs. Some probably still don’t know, including some of the medical people prescribing these medications. But the knowledge is out there now, and the effect size is in the case of some drugs argued to be large enough to be clinically relevant. In the context of a ‘self-experimentation’-angle the example is also interesting because the negative effect in question here is significantly delayed; type 2 diabetes takes time to develop, and this is an undesirable outcome which you’re not going to spot the way you might link a headache the next day to a specific drug you just started out with (another example of a delayed adverse event is incidentally cancer). You’re not going to spot dyslipidemia unless you keep track of your lipid levels on your own or e.g. develop xanthomas as a consequence of it, leading you to consult a physician. It helps a lot if you have proper research protocols and large n studies with sufficient power when you want to discover things like this, and when you want to determine whether an association like this is ‘just an association’ or if the link is actually causal (and then clarifying what we actually mean by that, and whether the causal link is also clinically relevant and/or for whom it might be clinically relevant). Presumably many people taking all kinds of medical drugs these days are taking on risks which might in a similar manner be ‘hidden from view’ as was the risk of diabetes in people taking second-generation antipsychotics in the near-past; over time epidemiological studies may pick up on some of these risks, but many will probably remain hidden from view on account of the amount of complexity involved. Even if a drug ‘works’ as intended in the context of the target variable in question, you can get into a lot of trouble if you only focus on the target variable (“if a drug has no side effects, then it is unlikely to work“). People working in drug development know this.
The book has a lot of blog-worthy stuff so I decided to include some quotes in the coverage below. The quotes are from the first half of the book, and this part of the coverage actually doesn’t talk much about the effects of drugs; it mainly deals with epidemiology and cost estimates. I thus decided to save the ‘drug coverage’ to a later post. It should perhaps be noted that some of the things I’d hoped to learn from Ru-Band Lu et al.’s book (blog coverage here) was actually included in this one, which was nice.
“Those with mental illness are at higher risk and are more likely to suffer the severe consequences of comorbid medical illness. Adherence to treatment is often more difficult, and other factors such as psychoneuroendocrine interactions may complicate already problematic treatments. Additionally, psychiatric medications themselves often have severe side effects and can interact with other medications, rendering treatment of the mental illness more complicated. Diabetes is one example of a comorbid medical illness that is seen at a higher rate in people with mental illness.”
“Depression rates have been studied and are increased in type 1 and type 2 diabetes. In a meta-analysis, Barnard et al. reviewed 14 trials in which patients with type 1 diabetes were surveyed for rates of depression.16 […] subjects with type 1 diabetes had a 12.0% rate of depression compared with a rate of 3.4% in those without diabetes. In noncontrolled trials, they found an even higher rate of depression in patients with type 1 diabetes (13.4%). However, despite these overall findings, in trials that were considered of an adequate design, and with a substantially rigorous depression screening method (i.e., use of structured clinical interview rather than patient reported surveys), the rates were not statistically significantly increased (odds ratio [OR] 2.36, 95% confidence interval [CI] 0.69–5.4) but had such substantial variation that it was not sufficient to draw a conclusion regarding type 1 diabetes. […] When it comes to rates of depression, type 2 diabetes has been studied more extensively than type 1 diabetes. Anderson et al. compiled a large metaanalysis, looking at 42 studies involving more than 21,000 subjects to assess rates of depression among patients with type 1 versus type 2 diabetes mellitus.18 Regardless of how depression was measured, type 1 diabetes was associated with lower rates of depression than type 2 diabetes. […] Depression was significantly increased in both type 1 and type 2 diabetes, with increased ORs for subjects with type 1 (OR = 2.9, 95% CI 1.6 –5.5, […] p=0.0003) and type 2 disease (OR = 2.9, 95% CI 2.3–3.7, […] p = 0.0001) compared with controls. Overall, with multiple factors controlled for, the risk of depression in people with diabetes was approximately twofold. In another large meta-analysis, Ali et al. looked at more than 51,000 subjects in ten different studies to assess rates of depression in type 2 diabetes mellitus. […] the OR for comorbid depression among the diabetic patients studied was higher for men than for women, indicating that although women with diabetes have an overall increased prevalence of depression (23.8 vs. 12.8%, p = 0.0001), men with diabetes have an increased risk of developing depression (men: OR = 1.9, 95% CI = 1.7–2.1 vs. women: OR = 1.3, 95% CI = 1.2–1.4). […] Research has shown that youths 12–17 years of age with type 1 diabetes had double the risk of depression compared with a teenage population without diabetes.21 This amounted to nearly 15% of children meeting the criteria for depression.”
“As many as two-thirds of patients with diabetes and major depression have been ill with depression for more than 2 years.44 […] Depression has been linked to decreased adherence to self-care regimens (exercise, diet, and cessation of smoking) in patients with diabetes, as well as to the use of diabetes control medications […] Patients with diabetes and depression are twice as likely to have three or more cardiac risk factors such as smoking, obesity, sedentary lifestyle, or A1c > 8.0% compared with patients with diabetes alone.47 […] The costs for individuals with both major depression and diabetes are 4.5 times greater than for those with diabetes alone.53”
“A 2004 cross-sectional and longitudinal study of data from the Health and Retirement Study demonstrated that the cumulative risk of incident disability over an 8-year period was 21.3% for individuals with diabetes versus 9.3% for those without diabetes. This study examined a cohort of adults ranging in age from 51 to 61 years from 1992 through 2000.”
“Although people with diabetes comprise just slightly more than 4% of the U.S. population,3 19% of every dollar spent on health care (including hospitalizations, outpatient and physician visits, ambulance services, nursing home care, home health care, hospice, and medication/glucose control agents) is incurred by individuals with diabetes” (As I noted in the margin, these are old numbers, and prevalence in particular is definitely higher today than it was when that chapter was written, so diabetics’ proportion of the total cost is likely even higher today than it was when that chapter was written. As observed multiple times previously on this blog, most of these costs are unrelated to the costs of insulin treatment and oral anti-diabetics like metformin, and indirect costs make out a quite substantial proportion of the total costs).
“In 1997, only 8% of the population with a medical claim of diabetes was treated for diabetes alone. Other conditions influenced health care spending, with 13.8% of the population with one other condition, 11.2% with two comorbidities, and 67% with three or more related conditions.6 Patients with diabetes who suffer from comorbid conditions related to diabetes have a greater impact on health services compared with those patients who do not have comorbid conditions. […] Overall, comorbid conditions and complications are responsible for 75% of total medical expenditures for diabetes.” (Again, these are old numbers)
“Heart disease and stroke are the largest contributors to mortality for individuals with diabetes; these two conditions are responsible for 65% of deaths. Death rates from heart disease in adults with diabetes are two to four times higher than in adults without diabetes. […] Adults with diabetes are more than twice as likely to have multiple diagnoses related to macrovascular disease compared to patients without diabetes […] Although the prevalence of cardiovascular disease increases with age for both diabetics and nondiabetics, adults with diabetes have a significantly higher rate of disease. […] The management of macrovascular disease, such as heart attacks and strokes, represents the largest factor driving medical service use and related costs, accounting for 52% of costs to treat diabetes over a lifetime. The average costs of treating macrovascular disease are $24,330 of a total of $47,240 per person (in year 2000 dollars) over the course of a lifetime.17 Moreover, macrovascular disease is an important determinant of cost at an earlier time than other complications, accounting for 85% of the cumulative costs during the first 5 years following diagnosis and 77% over the initial decade. [Be careful here: This is completely driven by type 2 diabetics; a 10-year old newly diagnosed type 1 diabetic does not develop heart disease in the first decade of disease – type 1s are also at high risk of cardiovascular disease, but the time profile here is completely different] […] Cardiovascular disease in the presence of diabetes affects not only cost but also the allocation of health care resources. Average annual individual costs attributed to the treatment of diabetes with cardiovascular disease were $10,172. Almost 51% of costs were for inpatient hospitalizations, 28% were for outpatient care, and 21% were for pharmaceuticals and related supplies. In comparison, the average annual costs for adults with diabetes and without cardiovascular disease were $4,402 for management and treatment of diabetes. Only 31.2% of costs were for inpatient hospitalizations, 40.3% were for outpatient care, and 28.6% were for pharmaceuticals.16“
“Of individuals with diabetes, 2% to 3% develop a foot ulcer during any given year. The lifetime incidence rate of lower extremity ulcers is 15% in the diabetic population.20 […] The rate of amputation in individuals with diabetes is ten times higher than in those without diabetes.5 Diabetic lower-extremity ulcers are responsible for 92,000 amputations each year,21 accounting for more than 60% of all nontraumatic amputations.5 The 10-year cumulative incidence of lower-extremity amputation is 7% in adults older than 30 years of age who are diagnosed with diabetes.22 […] Following amputation, the 5-year survival rate is 27%.23 […] The majority of annual costs associated with treating diabetic peripheral neuropathy are associated with treatment of ulcers […] Overall, inpatient hospitalization is a major driver of cost, accounting for 77% of expenditures associated with individual episodes of lower-extremity ulcers.24“
“By 2003, diabetes accounted for 37% of individuals being treated for renal disease in the United States. […] Diabetes is the leading cause of kidney failure, accounting for 44% of all newly diagnosed cases. […] The amount of direct medical costs for ESRD attributed to diabetes is substantial. The total adjusted costs in a 24-month period were 76% higher among ESRD patients with diabetes compared with those without diabetes. […] Nearly one half of the costs of ESRD are due to diabetes.27” [How much did these numbers change since the book was written? I’m not sure, but these estimates do provide some sort of a starting point, which is why I decided to include the numbers even though I assume some of them may have changed since the publication of the book]
“Every percentage point decrease in A1c levels reduces the risk of microvascular complications such as retinopathy, neuropathy, and nephropathy by 40%.5 However, the trend is for A1c to drift upward at an average of 0.15% per year, increasing the risk of complications and costs.17 […] A1c levels also affect the cost of specific complications associated with diabetes. Increasing levels affect overall cost and escalate more dramatically when comorbidities are present. A1c along with cardiovascular disease, hypertension, and depression are significant independent predictors of health care
costs in adults with diabetes.”
This book is not exactly the first book I’ve read on these kinds of topics (see for example my previous coverage of related topics here, here, here, here, here, and here), but the book did have some new stuff and I decided in the end that it was worth blogging, despite the fact that I did not think the book was particularly great. The book is slightly different from previous books I’ve read on related topics because normative aspects are covered in much greater detail – as they put it in the preface:
“This volume addresses normative dimensions of methodological and theoretical approaches, international experiences concerning the normative framework and the process of priority setting as well as the legal basis behind priorities. It also examines specific criteria for prioritization and discusses economic evaluation. […] Prioritization is necessary and inevitable – not only for reasons of resource scarcity, which might become worse in the next few years. But especially in view of an optimization of the supply structures, prioritization is an essential issue that will contribute to the capability and stability of healthcare systems. Therefore, our volume may give useful impulses to face challenges of appropriate prioritization.”
I’m generally not particularly interested in normative questions, preferring instead to focus on the empirical side of things, but the book did have some data as well. In the post I’ll focus on topics I found interesting, and I have made no attempt here to make the coverage representative of the sort of topics actually covered in the book; this is (as usual) a somewhat biased account of the material covered.
The book observes early and often that there’s no way around prioritization in medicine; you can’t not prioritize, because “By giving priority to one group, you ration care to the second group.” Every time you spend a dollar on cancer treatment, well, that’s a dollar you can’t spend on heart disease. So the key question in this context is how best to prioritize, rather than whether you should do it. It is noted in the text that there is a wide consensus that approaching and handling health care allocation rules explicitly is preferable to implicit rationing, a point I believe was also made in Glied and Smith. A strong argument can be made that clear and well-defined decision-rules will lead to better outcomes than implicit allocation decisions made by doctors during their day-to-day workload. The risks of leaving allocation decisions to physicians involve overtaxing medical practitioners (they are implicitly required to repeatedly take decisions which may be emotionally very taxing), problematic and unfair distribution patters of care, and there’s also a risk that such practices may erode trust between patients and physicians.
A point related to the fact that any prioritization decision made within the medical sector, regardless of whether the decision is made implicitly or explicitly, will necessarily affect all patient populations by virtue of the fact that resources used for one purpose cannot be used for another purpose, is that the health care sector is not the only sector in the economy; when you spend money on medicine that’s also money you can’t be spending on housing or education: “The competition between health-related resources and other goods is generally left to a political process. The fact that a societal budget for meeting health needs is the result of such a political process means that in all societies, some method of resolving disagreements about priorities is needed.” Different countries have different approaches to how to resolve these disagreements (and in large countries in particular, lower-level regional differences may also be important in terms of realized care provision allocation decisions), and the book covers systems applied in multiple different countries, including England, Germany, Norway, Sweden, and the US state of Oregon.
Some observations and comments:
“A well-known unfairness objection against conventional cost-effectiveness analysis is the severity of diseases objection – the objection that the approach is blind as to whether the QALYs go to severely or to slightly ill patients. Another is the objection of disability discrimination – the objection that the approach is not blind between treating a life-threatening disease when it befalls a disabled patient and treating the same disease when it befalls a non-disabled patient. An ad hoc amendment for fairness problems like these is equity weighting. Equity weights are multiplication factors that are introduced in order to make some patient group’s QALYs count more than others.”
“There were an estimated 3 million people with diabetes in England in 2009; estimates suggest that the number of people with diabetes could rise to 4.6 million by 2030. There has also been a rapid rise in gastrointestinal diseases, particularly chronic liver disease where the under-65 mortality rate has increased 5-fold since 1970. Liver disease is strongly linked to the harmful use of alcohol and rising levels of obesity. […] the poorest members of the community are at most risk of neglecting their health. This group is more likely to eat, drink and smoke to excess and fail to take sufficient exercise.22 Accordingly, life expectancy in this community is shorter and the years spent of suffering from disability are much longer. […] Generic policies are effective in the sense that aggregate levels of health status improve and overall levels of morbidity and mortality fall. However, they are ineffective in reducing health inequalities; indeed, they may make them worse. The reason is that better-off groups respond more readily to public health campaigns. […] If policy-makers [on the other hand] disinvest from the majority to narrow the inequality gap with a minority resistant to change, this could reduce aggregate levels of health status in the community as a whole. [Health behaviours also incidentally tend to be quite resistant to change in general, and we really don’t know all that much about which sort of interventions work and/or how well they work – see also Thirlaway & Upton’s coverage] […] two out of three adults [in the UK] are overweight or obese; and inequalities in health remain widespread, with people in the poorest areas living on average 7 years fewer than those in the richest areas, and spending up to 17 more years living with poor health. […] the proportion of the total health budget invested in preventive medicine and health promotion […] is small. The UK spends about 3.6 % of its entire healthcare budget on public health projects of this nature (which is more than many other EU member states).”
Let’s talk a little bit about rationing. Rationing by delay (waiting lists) is a well-known method of limiting care, but it’s far from the only way to implicitly ration care in a manner which may be hidden from view; another way to limit care provision is to ration by dilution. This may happen when patients are seen on time (do recall that waiting lists are very common in the medical sector, for very natural reasons which I’ve discussed here on the blog before), but the quality of care that is provided to patients receiving care goes down. Rationing by dilution may sometimes be a result of attempts to limit rationing by delay; if you measure hospitals on whether or not they treat people within a given amount of time, the time dimension becomes very important in the treatment context and it may thus end up dominating other decision variables which should ideally take precedence over this variable in the specific clinical context. The book mentions as an example the Bristol Eye Hospital, where it is thought that 25 patients may have lost their sights because even though they were urgent cases which should have been high priority, they were not treated in time because there was a great institutional focus on not allowing waiting times of any patients on the waiting lists to cross the allowed maximum waiting time, meaning that much less urgent cases were treated instead of the urgent cases in order to make the numbers look good. A(n excessive?) focus on waiting lists may thus limit focus on patient needs, and similar problems pop up when other goals aside from patient needs are emphasized in an institutional context; hospital reorganisations undertaken in order to improve financial efficiency may also result in lower standards of care, and in the book multiple examples of this having happened in a British context are discussed. The chapter in question does not discuss this aspect, but it seems to me likely that rationing by dilution, or at least something quite similar to this, may also happen in the context of a rapid increase in capacity as a result of an attempt to address long waiting lists; if you for example decide to temporarily take on a lot of new and inexperienced nurses to lower the waiting list, these new nurses may not provide the same level of care as do the experienced nurses already present. A similar dynamic may probably be observed in a setting where the number of nurses does not change, but each patient is allocated less time with any given nurse than was previously the case.
“Public preferences have been shown not to align with QALY maximization (or health benefit maximization) across a variety of contexts […] and considerations affecting these preferences often extend well beyond strict utilitarian concerns […] age has been shown to be among the most frequently cited variables affecting the public’s prioritization decisions […] Most people are willing to use age as a criterion at least in some circumstances and at least in some ways. This is shown by empirical studies of public views on priority setting […] most studies suggest that a majority accepts that age can have some role in priority setting. […] Oliver [(2009)] found […] a wide range of context-dependent ‘decision rules’ emerged across the decision tasks that appeared to be dependent on the scenario presented. Respondents referenced reasons including maximizing QALYs,11 maximizing life-years or post-treatment quality of life,12 providing equal access to health care, maximizing health based on perceptions of adaptation, maximizing societal productivity (including familial roles, i.e. ‘productivity ageism’), minimizing suffering, minimizing costs, and distributing available resources equitably. As an illustration of its variability, he noted that 46 of the 50 respondents were inconsistent in their reasoning across the questions. Oliver commented that underlying values influence the respondents’ decisions, but if these values are context dependent, it becomes a challenge – if not impossible – to identify a preferred, overarching rule by which to distribute resources. […] Given the empirical observations that respondents do not seem to rely upon a consistent decision rule that is independent of the prioritization context, some have suggested that deliberative judgments be used to incorporate equity considerations […]. This means that decision makers may call upon a host of different ‘rules’ to set priorities depending on the context. When the patients are of similar ages, prioritization by severity may offer a morally justifiable solution, for example. In contrast, as the age discrepancy becomes greater between the two patients, there may be a point at which ‘the priority view’ (i.e. those who in the most dire conditions take precedence) no longer holds […] There is some evidence that indicates that public preferences do not support giving priority in instances where the intervention has a poor prognosis […] If older patients have poorer health outcomes as a result of certain interventions, [this] finding might imply that in these instances, they should receive lower priority or not be eligible for certain care. […] A substantial body of evidence indicates that the utilitarian approach of QALY maximization fails to adequately capture public preferences for a greater degree of equity into health-care distribution; however, how to go about incorporating these concerns remains unresolved.”
“roughly 35 % of the […] [UK] health expenditures were spent on the 13 % of our population over the age of 65. A similar statistic holds true for the European Union as well […] the elderly, on average, have many more health needs than the non-elderly. In the United States, 23 % of the elderly have five or more chronic health problems, some life-threatening, some quality-of-life diminishing (Thorpe et al. 2010). Despite this statistic, the majority of the elderly in any given year is quite healthy and makes minimal use of the health care system. Health needs tend to be concentrated. The sickest 5 % of the Medicare population consume 39 % of total Medicare expenditures, and the sickest 10 % consume 58 % of Medicare expenditures (Schoenman 2012). […] we are […] faced with the problem of where to draw the line with regard to a very large range of health deficiencies associated with advanced age. It used to be the case in the 1970s that neither dialysis nor kidney transplantation were offered as an option to patients in end-stage kidney failure who were beyond age 65 because it was believed they were not medically suitable. That is, both procedures were judged to be too burdensome for individuals who already had diminished health status. But some centers started dialyzing older patients with good results, and consequently, the fastest growing segment of the dialysis population today (2015) is over age 75. This phenomenon has now been generalized across many areas of surgery and medicine. […] What [many new] procedures have in common is that they are very expensive: $70,000 for coronary bypass surgery (though usually much more costly due to complication rates among the hyper-elderly); $200,000 for the LVAD [Left Ventricular Assist Device]; $100,000+ per month for prolonged mechanical ventilation. […] The average older recipient of an LVAD will gain one to two extra years of life […] there are now (2015) about 5.5 million Americans in various stages of heart failure and 550,000 new cases annually. Versions of the LVAD are still being improved, but the potential is that 200,000 of these devices could be implanted annually in the United States. That would add at least $40 billion per year to the cost of the Medicare program.”
“In the USA, around 40 % of premature mortality is attributed to behavioral patterns,2 and it is estimate[d] that around $1.3 trillion annually — around a third of the total health budget — is spent on preventable diseases.3 […] among the ten leading risk factors contributing to the burden of disease in high-income countries, seven can be directly attributed to unhealthy lifestyles. […] Private health insurance takes such factors into account when calculating premiums for health insurances (Olsen 2009). In contrast, publicly funded health-care systems are mainly based on the so-called solidarity principle, which generally excludes risk-based premiums. However, in some countries, several incentive schemes such as “fat taxes” […], bonuses, or reductions of premiums […] have recently been implemented in order to incorporate aspects of personal responsibility in public health-care systems. […] [An important point in this context is that] there are fundamental questions about whether […] better health leads to lower cost. Among other things, cost reductions are highly dependent on the period of time that one considers. What services are covered by a health system, and how its financing is managed, also matters. Regarding the relative lifetime cost of smokers, obese, and healthy people (never smokers, normal body mass index [BMI]) in the Netherlands, it has been suggested that the latter, and not the former two groups, are most costly — chiefly due to longer life and higher cost of care at the end of life.44 Other research suggests that incentivizing disease management programs rather than broader prevention programs is far more effective.45 Cost savings can therefore not be taken for granted but require consideration of the condition being incentivized, the organizational specifics of the health system, and, in particular, the time horizon over which possible savings are assessed. […] Policies seeking to promote personal responsibility for health can be structured in a very wide variety of ways, with a range of different consequences. In the best case, the stars are aligned and programs empower people’s health literacy and agency, reduce overall healthcare spending, alleviate resource allocation dilemmas, and lead to healthier and more productive workforces. But the devil is often in the detail: A focus on controlling or reducing cost can also lead to an inequitable distribution of benefits from incentive programs and penalize people for health risk factors that are beyond their control.”
Below are three new lectures from the Institute of Advanced Study. As far as I’ve gathered they’re all from an IAS symposium called ‘Lens of Computation on the Sciences’ – all three lecturers are computer scientists, but you don’t have to be a computer scientist to watch these lectures.
Should computer scientists and economists band together more and try to use the insights from one field to help solve problems in the other field? Roughgarden thinks so, and provides examples of how this might be done/has been done. Applications discussed in the lecture include traffic management and auction design. I’m not sure how much of this lecture is easy to follow for people who don’t know anything about either topic (i.e., computer science and economics), but I found it not too difficult to follow – it probably helped that I’ve actually done work on a few of the things he touches upon in the lecture, such as basic auction theory, the fixed point theorems and related proofs, basic queueing theory and basic discrete maths/graph theory. Either way there are certainly much more technical lectures than this one available at the IAS channel.
I don’t have Facebook and I’m not planning on ever getting a FB account, so I’m not really sure I care about the things this guy is trying to do, but the lecturer does touch upon some interesting topics in network theory. Not a great lecture in my opinion and occasionally I think the lecturer ‘drifts’ a bit, talking without saying very much, but it’s also not a terrible lecture. A few times I was really annoyed that you can’t see where he’s pointing that damn laser pointer, but this issue should not stop you from watching the video, especially not if you have an interest in analytical aspects of how to approach and make sense of ‘Big Data’.
I’ve noticed that Scott Alexander has said some nice things about Scott Aaronson a few times, but until now I’ve never actually read any of the latter guy’s stuff or watched any lectures by him. I agree with Scott (Alexander) that Scott (Aaronson) is definitely a smart guy. This is an interesting lecture; I won’t pretend I understood all of it, but it has some thought-provoking ideas and important points in the context of quantum computing and it’s actually a quite entertaining lecture; I was close to laughing a couple of times.
“A commonplace argument in contemporary writing on trust is that we would all be better off if we were all more trusting, and therefore we should all trust more […] Current writings commonly focus on trust as somehow the relevant variable in explaining differences across cases of successful cooperation. Typically, however, the crucial variable is the trustworthiness of those who are to be trusted or relied upon. […] It is not trust per se, but trusting the right people that makes for successful relationships and happiness.”
“If we wish to understand the role of trust in society […], we must get beyond the flaccid – and often wrong – assumption that trust is simply good. This supposition must be heavily qualified, because trusting the malevolent or the radically incompetent can be foolish and often even grossly harmful. […] trust only make[s] sense in dealings with those who are or who could be induced to be trustworthy. To trust the untrustworthy can be disastrous.”
That it’s stupid to trust people who cannot be trusted should in my opinion be blatantly obvious, yet somehow to a lot of people it doesn’t seem to be at all obvious; in light of this problem (…I maintain that this is indeed a problem) the above observations are probably among the most important ones included in Hardin’s book. The book includes some strong criticism of much of the current/extant literature on trust. The two most common fields of study within this area of research are game-theoretic ‘trust games’, which according to the author are ill-named as they don’t really seem to be dealing much, if at all, with the topic of trust, and (poor) survey research which asks people questions which are hard to answer and tend to yield answers which are even harder to interpret. I have included below a few concluding remarks from the chapter on these topics:
“Both of the current empirical research programs on trust are largely misguided. The T-games [‘trust-games’], as played […] do not elicit or measure anything resembling ordinary trust relations; and their findings are basically irrelevant to the modeling and assessment of trust and trustworthiness. The only thing that relates the so-called trust game […] to trust is its name, which is wrong and misleading. Survey questions currently in wide use are radically unconstrained. They therefore force subjects to assume the relevant degrees of constraint, such as how costly the risk of failed cooperation would be. […] In sum, therefore, there is relatively little to learn about trust from these two massive research programs. Without returning their protocols to address standard conceptions of trust, they cannot contribute much to understanding trust as we generally know it, and they cannot play a very constructive role in explaining social behavior, institutions, or social and political change. These are distressing conclusions because both these enterprises have been enormous, and in many ways they have been carried out with admirable care.”
There is ‘relatively little to learn about trust from these two massive research programs’, but one to me potentially important observation, hidden away in the notes at the end of the book, is perhaps worth mentioning here: “There is a commonplace claim that trust will beget trustworthiness […] Schotter [as an aside this guy was incidentally the author of the Micro textbook we used in introductory Microeconomics] and Sopher (2006) do not find this to be true in game experiments that they run, while they do find that trustworthiness (cooperativeness in the play of games) does beget trust (or cooperation).”
There were a few parts of the coverage which confused me somewhat until it occurred to me that the author might not have read Boyd and Richerson, or other people who might have familiarized him with their line of thinking and research (once again, you should read Boyd and Richerson).
Moving on, a few remarks on social capital:
“Like other forms of capital and human capital, social capital is not completely fungible but may be specific to certain activities. A given form of social capital that is valuable in facilitating certain actions may be useless or even harmful for others. […] [A] mistake is the tendency to speak of social capital as though it were a particular kind of thing that has generalized value, as money very nearly does […] it[‘s value] must vary in the sense that what is functional in one context may not be in another.”
It is important to keep in mind that trust which leads to increased cooperation can end up leading to both good outcomes and bad:
“Widespread customs and even very local practices of personal networks can impose destructive norms on people, norms that have all of the structural qualities of interpersonal capital. […] in general, social capital has no normative valence […] It is generally about means for doing things, and the things can be hideously bad as well as good, although the literature on social capital focuses almost exclusively on the good things it can enable and it often lauds social capital as itself a wonderful thing to develop […] Community and social capital are not per se good. It is a grand normative fiction of our time to suppose that they are.”
The book has a chapter specifically about trust on the internet which related to the coverage included in Barak et al.‘s book, a publication which I have unfortunately neglected to blog (this book of course goes into a lot more detail). A key point in that chapter is that the internet is not really all that special in terms of these things, in the sense that to the extent that it facilitates coordination etc., it can be used to accomplish beneficial things as well as harmful things – i.e. it’s also neutrally valenced. Barak et al.‘s book has a lot more stuff about how this medium impacts communication and optimal communication strategies etc., which links in quite a bit with trust aspects, but I won’t go into this stuff here and I’m pretty sure I’ve covered related topics before here on the blog, e.g. back when I covered Hargie.
The chapter about terrorism and distrust had some interesting observations. A few quotes:
“We know from varied contexts that people can have a more positive view of individuals from a group than they have of the group.”
“Mere statistical doubt in the likely trustworthiness of the members of some identifiable group can be sufficient to induce distrust of all members of the group with whom one has no personal relationship on which to have established trust. […] This statistical doubt can trump relational considerations and can block the initial risk-taking that might allow for a test of another individual’s trustworthiness by stereotyping that individual as primarily a member of some group. If there are many people with whom one can have a particular beneficial interaction, narrowing the set by excluding certain stereotypes is efficient […] Unfortunately, however, excluding very systematically on the basis of ethnicity or race becomes pervasively destructive of community relations.”
One thing to keep in mind here is that people’s stereotypes are often quite accurate. When groups don’t trust each other it’s always a lot of fun to argue about who’s to blame for that state of affairs, but it’s important here to keep in mind that both groups will always have mental models of both the in-group and the out-group (see also the coverage below). Also it should be kept in mind that to the extent that people’s stereotypes are accurate, blaming stereotyping behaviours for the problems of the people who get stereotyped is conceptually equivalent to blaming people for discriminating against untrustworthy people by not trusting people who are not trustworthy. You always come back to the problem that what’s at the heart of the matter is never just trust, but rather trustworthiness. To the extent that the two are related, trust follows trustworthiness, not the other way around.
“There’s a fairly extensive literature on so-called generalized trust, which is trust in the anonymous or general other person, including strangers, whom we might encounter, perhaps with some restrictions on what isues would come under that trust. […] [Generalized trust] is an implausible notion. In any real-world context, I trust some more than others and I trust any given person more about some things than about others and more in some contexts than in others. […] Whereas generalized trust or group-generalized trust makes little or no sense (other than as a claim of optimism), group-generalized distrust in many contexts makes very good sense. If you were Jewish, Gypsy, or gay, you had good reason to distrust all officers of the Nazi state and probably most citizens in Nazi Germany as well. American Indians of the western plains had very good reason to distrust whites. During Milosevic’s wars and pogroms, Serbs, Croatians, and Muslims in then Yugoslavia had increasingly good reasons to distrust most members of the other groups, especially while the latter were acting as groups. […] In all of these cases, distrust is defined by the belief that members of the other groups and their representatives are hostile to one’s interests. Trust relationships between members of these various groups are the unusual cases that require explanation; the relatively group-generalized distrust is easy to understand and justify.”
“In the current circumstances of mostly Arab and Islamic terrorism against israel and the West and much of the rest of the world, it is surely a very tiny fraction of all Arabs and Islamists who are genuinely a threat, but the scale of their threat may make many Israelis and westerners wary of virtually all Arabs and Islamists […] many who are not prospects for taking terrorist action evidently sympathize with and even support these actions”
“When cooperation is organized by communal norms, it can become highly exclusionary, so that only members of the community can have cooperative relations with those in the community. In such a case, the norms of cooperativeness are norms of exclusion […] For many fundamentalist groups, continued loyalty to the group and its beliefs is secured by isolating the group and its members from many other influences so that relations within the community are governed by extensive norms of exclusion. When this happens, it is not only trust relations but also basic beliefs that are constrained. If we encounter no one with contrary beliefs our own beliefs will tend to prevail by inertia and lack of questioning and they will be reinforced by our secluded, exclusionary community. There are many strong, extreme beliefs about religious issues as well as about many other things. […] The two matters for which such staunch loyalty to unquestioned beliefs are politically most important are probably religious and nationalist commitments […] Such beliefs are often maintained by blocking our alternative views and by sanctioning those within the group who stray. […] Narrowing one’s associations to others in an isolated extremist group cripples one’s epistemology by blocking out general questioning of the group’s beliefs […] To an outsider those beliefs might be utterly crazy. Indeed, virtually all strong religious beliefs sound crazy or silly to those who do not share them. […] In some ways, the internet allows individuals and small groups to be quite isolated while nevertheless maintaining substantial contact with others of like mind. Islamic terrorists in the West can be almost completely isolated individually while maintaining nearly instant, frequent contact with other and with groups in the Middle East, Pakistan, or Afghanistan, as well as with groups of other potential terrorists in target nations.”
David Friedman recently asked a related question on SSC (he asked about why there are waiting lists for surgical procedures), and I decided that as I’d read some stuff about these topics in the past I might as well answer his question. The answer turned out to be somewhat long/detailed, and I decided I might as well post some of this stuff here as well. In a way my answer to David’s question provides belated coverage of a book I read last year, Appointment Planning in Outpatient Clinics and Diagnostic Facilities, which I have covered only in very limited detail here on the blog before (the third paragraph of this post is the only coverage of the book I’ve provided here).
Below I’ve tried to cover these topics in a manner which would make it unnecessary to also read David’s question and related comments.
The brief Springer publication Appointment Planning in Outpatient Clinics and Diagnostic Facilities has some basic stuff about operations research and queueing theory which is useful for making sense of resource allocation decisions made in the medical sector. I think this is the kind of stuff you’ll want to have a look at if you want to understand these things better.
There are many variables which are important here and which may help explain why waiting lists are common in the health care sector (it’s not just surgery). The quotes below are from the book:
“In a walk-in system, patients are seen without an appointment. […] The main advantage of walk-in systems is that access time is reduced to zero. […] A huge disadvantage of patients walking in, however, is that the usually strong fluctuating arrival stream can result in an overcrowded clinic, leading to long waiting times, high peaks in care provider’s working pressure, and patients leaving without treatment (blocking). On other moments of time the waiting room will be practically empty […] In regular appointment systems workload can be dispersed, although appointment planning is usually time consuming. A walk-in system is most suitable for clinics with short service times and multiple care providers, such as blood withdrawal facilities and pre-anesthesia check-ups for non-complex patients. If the service times are longer or the number of care providers is limited, the probability that patients experience a long waiting time becomes too high, and a regular appointment system would be justified”
“Sometimes it is impossible to provide walk-in service for all patients, for example when specific patients need to be prepared for their consultation, or if specific care providers are required, such as anesthesiologists [I noted in my reply to David that these remarks seem highly relevant for the surgery context]. Also, walk-in patients who experience a full waiting room upon arrival may choose to come back at a later point in time. To make sure that they do have access at that point, clinics usually give these patients an appointment. This combination of walk-in and appointment patients requires a specific appointment system that satisfies the following requirements:
1. The access time for appointment patients is below a certain threshold
2. The waiting time for walk-in patients is below a certain threshold
3. The number of walk-in patients who are sent away due to crowding is minimized
To satisfy these requirements, an appointment system should be developed to determine the optimal scheduling of appointments, not only on a day level but also on a week level. Developing such an appointment system is challenging from a mathematical perspective. […] Due to the high variability that is usually observed in healthcare settings, introducing stochasticity in the modeling process is very important to obtain valuable and reasonable results.”
“Most elective patients will ultimately evolve into semi-urgent or even urgent patients if treatment is extensively prolonged.” That’s ‘on the one hand’ – but of course there’s also the related ‘on the other hand’-observation that: “Quite often a long waiting list results in a decrease in demand”. Patients might get better on their own and/or decide it’s not worth the trouble to see a service provider – or they might deteriorate.
“Some planners tend to maintain separate waiting lists for each patient group. However, if capacity is shared among these groups, the waiting list should be considered as a whole as well. Allocating capacity per patient group usually results in inflexibility and poor performance”.
“mean waiting time increases with the load. When the load is low, a small increase therein has a minimal effect on the mean waiting time. However, when the load is high, a small increase has a tremendous effect on the mean waiting time. For instance, […] increasing the load from 50 to 55 % increases the waiting time by 10 %, but increasing the load from 90 to 95 % increases the waiting time by 100 % […] This explains why a minor change (for example, a small increase in the number of patients, a patient arriving in a bed or a wheelchair) can result in a major increase in waiting times as sometimes seen in outpatient clinics.”
“One of the most important goals of this chapter is to show that it is impossible to use all capacity and at the same time maintain a short, manageable waiting list. A common mistake is to reason as follows:
Suppose total capacity is 100 appointments. Unused capacity is commonly used for urgent and inpatients, that can be called in last minute. 83 % of capacity is used, so there is on average 17 % of capacity available for urgent and inpatients. The urgent/inpatient demand is on average 20 appointments per day. Since 17 appointments are on average not used for elective patients, a surplus capacity of only three appointments is required to satisfy all patient demand.
Even though this is true on average, more urgent and inpatient capacity is required. This is due to the variation in the process; on certain days 100 % of capacity is required to satisfy elective patient demand, thus leaving no room for any other patients. Furthermore, since 17 slots are dedicated to urgent and inpatients, only 83 slots are available for elective patients, which means that ρ is again equal to 1, resulting in an uncontrollable waiting list.” [ρ represents the average proportion of time which the server/service provider is occupied – a key stability requirement is that ρ is smaller than one; if it is not, the length of the queue becomes unstable/explodes. See also this related link].
“The challenge is to make a trade-off between maintaining a waiting list which is of acceptable size and the amount of unused capacity. Since the focus in many healthcare facilities is on avoiding unused capacity, waiting lists tend to grow until “something has to be done.” Then, temporarily surplus capacity is deployed, which is usually more expensive than regular capacity […]. Even though waiting lists have a buffer function (i.e., by creating a reservoir of patients that can be planned when demand is low) it is unavoidable that, even in well-organized facilities, over a longer period of time not all capacity is used.”
I think one way to think about the question of whether it makes sense to have a waiting list or whether you can ‘just use the price variable’ is that if it is possible for you as a provider to optimize over both the waiting time variable and the price variable (i.e., people demanding the service find some positive waiting time to be acceptable when it is combined with a non-zero price reduction), the result you’re going to get is always going to be at least as good as an option where you only have the option of optimizing over price – not including waiting time in the implicit pricing mechanism can be thought of as in a sense a weakly dominated strategy.
A lot of the planning stuff relates to how to handle variable demand, and input heterogeneities can be thought of as one of many parameters which may be important to take into account in the context of how best to deal with variable demand; surgeons aren’t perfect substitutes. Perhaps neither are nurses, or different hospitals (relevant if you’re higher up in the decision making hierarchy). An important aspect is the question of whether a surgeon (or a doctor, or a nurse…) might be doing other stuff instead of surgery during down-periods, and what might be the value of that other stuff s/he might be doing instead. In the surgical context, not only is demand variable over time, there are also issues such as that many different inputs need to be coordinated; you need a surgeon and a scrub nurse and an anesthesiologist. The sequential and interdependent nature of many medical procedures and inputs is likely also a factor in terms of adding complexity; whether a condition requires treatment or not, and/or which treatment may be required, may depend upon the results of a test which has to be analyzed before the treatment is started, and so you for example can’t switch the order of test and treatment, or for that matter treat patient X based on patient Y’s test results; there’s some built-in inflexibility here at the outset. This type of thing also means there are more nodes in the network, and more places where things can go wrong, resulting in longer waiting times than planned.
I think the potential gains in terms of capacity utilization, risk reduction and increased flexibility to be derived from implementing waiting schemes of some kind in the surgery context would mediate strongly against a model without waiting lists, and I think that the surgical field is far from unique in that respect in the context of medical care provision.
This will be my last post about the book. Yesterday I finished reading Darwin’s Origin of Species, which was my 100th book this year (here’s the list), but I can’t face blogging that book at the moment so coverage of that one will have to wait a bit.
In my second post about this book I had originally planned to cover chapter 7 – ‘Analysing costs’ – but as I didn’t like to spend too much time on the post I ended up cutting it short. This omission of coverage in the last post means that some themes to be discussed below are closely related to stuff covered in the second post, whereas on the other hand most of the remaining material, more specifically the material from chapters 8, 9 and 10, deal with decision analytic modelling, a quite different topic; in other words the coverage will be slightly more fragmented and less structured than I’d have liked it to be, but there’s not really much to do about that (it doesn’t help in this respect that I decided to not cover chapter 8, but doing that as well was out of the question).
I’ll start with coverage of some of the things they talk about in chapter 7, which as mentioned deals with how to analyze costs in a cost-effectiveness analysis context. They observe in the chapter that health cost data are often skewed to the right, for several reasons (costs incurred by an individual cannot be negative; for many patients the costs may be zero; some study participants may require much more care than the rest, creating a long tail). One way to address skewness is to use the median instead of the mean as the variable of interest, but a problem with this approach is that the median will not be as useful to policy-makers as will be the mean; as the mean times the population of interest will give a good estimate of the total costs of an intervention, whereas the median is not a very useful variable in the context of arriving at an estimate of the total costs. Doing data transformations and analyzing transformed data is another way to deal with skewness, but their use in cost effectiveness analysis have been questioned for a variety of reasons discussed in the chapter (to give a couple of examples, data transformation methods perform badly if inappropriate transformations are used, and many transformations cannot be used if there are data points with zero costs in the data, which is very common). Of the non-parametric methods aimed at dealing with skewness they discuss a variety of tests which are rarely used, as well as the bootstrap, the latter being one approach which has gained widespread use. They observe in the context of the bootstrap that “it has increasingly been recognized that the conditions the bootstrap requires to produce reliable parameter estimates are not fundamentally different from the conditions required by parametric methods” and note in a later chapter (chapter 11) that: “it is not clear that boostrap results in the presence of severe skewness are likely to be any more or less valid than parametric results […] bootstrap and parametric methods both rely on sufficient sample sizes and are likely to be valid or invalid in similar circumstances. Instead, interest in the bootstrap has increasingly focused on its usefulness in dealing simultaneously with issues such as censoring, missing data, multiple statistics of interest such as costs and effects, and non-normality.” Going back to the coverage in chapter 7, in the context of skewness they also briefly touch upon the potential use of a GLM framework to address this problem.
Data is often missing in cost datasets. Some parts of their coverage of these topics was to me but a review of stuff already covered in Bartholomew. Data can be missing for different reasons and through different mechanisms; one distinction is among data missing completely at random (MCAR), missing at random (MAR) (“missing data are correlated in an observable way with the mechanism that generates the cost, i.e. after adjusting the data for observable differences between complete and missing cases, the cost for those with missing data is the same, except for random variation, as for those with complete data”), and not missing at random (NMAR); the last type is also called non-ignorably missing data, and if you have that sort of data the implication is that the costs of those in the observed and unobserved groups differ in unpredictable ways, and if you ignore the process that drives these differences you’ll probably end up with a biased estimator. Another way to distinguish between different types of missing data is to look at patterns within the dataset, where you have:
“*univariate missingness – a single variable in a dataset is causing a problem through missing values, while the remaining variables contain complete information
*unit non-response – no data are recorded for any of the variables for some patients
*monotone missing – caused, for example, by drop-out in panel or longitudinal studies, resulting in variables observed up to a certain time point or wave but not beyond that
*multivariate missing – also called item non-response or general missingness, where some but not all of the variables are missing for some of the subjects.”
The authors note that the most common types of missingness in cost information analyses are the latter two. They discuss some techniques for dealing with missing data, such as complete-case analysis, available-case analysis, and imputation, but I won’t go into the details here. In the last parts of the chapter they talk a little bit about censoring, which can be viewed as a specific type of missing data, and ways to deal with it. Censoring happens when follow-up information on some subjects is not available for the full duration of interest, which may be caused e.g. by attrition (people dropping out of the trial), or insufficient follow up (the final date of follow-up might be set before all patients reach the endpoint of interest, e.g. death). The two most common methods for dealing with censored cost data are the Kaplan-Meier sample average (-KMSA) estimator and the inverse probability weighting (-IPW) estimator, both of which are non-parametric interval methods. “Comparisons of the IPW and KMSA estimators have shown that they both perform well over different levels of censoring […], and both are considered reasonable approaches for dealing with censoring.” One difference between the two is that the KMSA, unlike the IPW, is not appropriate for dealing with censoring due to attrition unless the attrition is MCAR (and it almost never is), because the KM estimator, and by extension the KMSA estimator, assumes that censoring is independent of the event of interest.
The focus in chapter 8 is on decision tree models, and I decided to skip that chapter as most of it is known stuff which I felt no need to review here (do remember that I to a large extent use this blog as an extended memory, so I’m not only(/mainly?) writing this stuff for other people..). Chapter 9 deals with Markov models, and I’ll talk a little bit about those in the following.
“Markov models analyse uncertain processes over time. They are suited to decisions where the timing of events is important and when events may happen more than once, and therefore they are appropriate where the strategies being evaluated are of a sequential or repetitive nature. Whereas decision trees model uncertain events at chance nodes, Markov models differ in modelling uncertain events as transitions between health states. In particular, Markov models are suited to modelling long-term outcomes, where costs and effects are spread over a long period of time. Therefore Markov models are particularly suited to chronic diseases or situations where events are likely to recur over time […] Over the last decade there has been an increase in the use of Markov models for conducting economic evaluations in a health-care setting […]
A Markov model comprises a finite set of health states in which an individual can be found. The states are such that in any given time interval, the individual will be in only one health state. All individuals in a particular health state have identical characteristics. The number and nature of the states are governed by the decisions problem. […] Markov models are concerned with transitions during a series of cycles consisting of short time intervals. The model is run for several cycles, and patients move between states or remain in the same state between cycles […] Movements between states are defined by transition probabilities which can be time dependent or constant over time. All individuals within a given health state are assumed to be identical, and this leads to a limitation of Markov models in that the transition probabilities only depend on the current health state and not on past health states […the process is memoryless…] – this is known as the Markovian assumption”.
The note that in order to build and analyze a Markov model, you need to do the following: *define states and allowable transitions [for example from ‘non-dead’ to ‘dead’ is okay, but going the other way is, well… For a Markov process to end, you need at least one state that cannot be left after it has been reached, and those states are termed ‘absorbing states’], *specify initial conditions in terms of starting probabilities/initial distribution of patients, *specify transition probabilities, *specify a cycle length, *set a stopping rule, *determine rewards, *implement discounting if required, *analysis and evaluation of the model, and *exploration of uncertainties. They talk about each step in more detail in the book, but I won’t go too much into this.
Markov models may be governed by transitions that are either constant over time or time-dependent. In a Markov chain transition probabilities are constant over time, whereas in a Markov process transition probabilities vary over time (/from cycle to cycle). In a simple Markov model the baseline assumption is that transitions only occur once in each cycle and usually the transition is modelled as taking place either at the beginning or the end of cycles, but in reality transitions can take place at any point in time during the cycle. One way to deal with the problem of misidentification (people assumed to be in one health state throughout the cycle even though they’ve transfered to another health state during the cycle) is to use half-cycle corrections, in which an assumption is made that on average state transitions occur halfway through the cycle, instead of at the beginning or the end of a cycle. They note that: “the important principle with the half-cycle correction is not when the transitions occur, but when state membership (i.e. the proportion of the cohort in that state) is counted. The longer the cycle length, the more important it may be to use half-cycle corrections.” When state transitions are assumed to take place may influence factors such as cost discounting (if the cycle is long, it can be important to get the state transition timing reasonably right).
When time dependency is introduced into the model, there are in general two types of time dependencies that impact on transition probabilities in the models. One is time dependency depending on the number of cycles since the start of the model (this is e.g. dealing with how transition probabilities depend on factors like age), whereas the other, which is more difficult to implement, deals with state dependence (curiously they don’t use these two words, but I’ve worked with state dependence models before in labour economics and this is what we’re dealing with here); i.e. here the transition probability will depend upon how long you’ve been in a given state.
Below I mostly discuss stuff covered in chapter 10, however I also include a few observations from the final chapter, chapter 11 (on ‘Presenting cost-effectiveness results’). Chapter 10 deals with how to represent uncertainty in decision analytic models. This is an important topic because as noted later in the book, “The primary objective of economic evaluation should not be hypothesis testing, but rather the estimation of the central parameter of interest—the incremental cost-effectiveness ratio—along with appropriate representation of the uncertainty surrounding that estimate.” In chapter 10 a distinction is made between variability, heterogeneity, and uncertainty. Variability has also been termed first-order uncertainty or stochastic uncertainty, and pertains to variation observed when recording information on resource use or outcomes within a homogenous sample of individuals. Heterogeneity relates to differences between patients which can be explained, at least in part. They distinguish between two types of uncertainty, structural uncertainty – dealing with decisions and assumptions made about the structure of the model – and parameter uncertainty, which of course relates to the precision of the parameters estimated. After briefly talking about ways to deal with these, they talk about sensitivity analysis.
“Sensitivity analysis involves varying parameter estimates across a range and seeing how this impacts on he model’s results. […] The simplest form is a one-way analysis where each parameter estimate is varied independently and singly to observe the impact on the model results. […] One-way sensitivity analysis can give some insight into the factors influencing the results, and may provide a validity check to assess what happens when particular variables take extreme values. However, it is likely to grossly underestimate overall uncertainty, and ignores correlation between parameters.”
Multi-way sensitivity analysis is a more refined approach, in which more than one parameter estimate is varied – this is sometimes termed scenario analysis. A different approach is threshold analysis, where one attempts to identify the critical value of one or more variables so that the conclusion/decision changes. All of these approaches are deterministic approaches, and they are not without problems. “They fail to take account of the joint parameter uncertainty and correlation between parameters, and rather than providing the decision-maker with a useful indication of the likelihood of a result, they simply provide a range of results associated with varying one or more input estimates.” So of course an alternative has been developed, namely probabilistic sensitivity analysis (-PSA), which already in the mid-80es started to be used in health economic decision analyses.
“PSA permits the joint uncertainty across all the parameters in the model to be addressed at the same time. It involves sampling model parameter values from distributions imposed on variables in the model. […] The types of distribution imposed are dependent on the nature of the input parameters [but] decision analytic models for the purpose of economic evaluation tend to use homogenous types of input parameters, namely costs, life-years, QALYs, probabilities, and relative treatment effects, and consequently the number of distributions that are frequently used, such as the beta, gamma, and log-normal distributions, is relatively small. […] Uncertainty is then propagated through the model by randomly selecting values from these distributions for each model parameter using Monte Carlo simulation“.
Like in the first post I cannot promise I have not already covered the topics I’m about to cover in this post before on the blog. In this post I’ll include and discuss material from two chapters of the book: the chapters on how to measure, value, and analyze health outcomes, and the chapter on how to define, measure, and value costs. In the last part of the post I’ll also talk a little bit about some research related to the coverage which I’ve recently looked at in a different context.
In terms of how to measure health outcomes the first thing to note is that there are lots and lots of different measures (‘thousands’) that are used to measure aspects of health. The symptoms causing problems for an elderly man with an enlarged prostate are not the same symptoms as the ones which are bothering a young child with asthma, and so it can be very difficult to ‘standardize’ across measures (more on this below).
A general distinction in this area is that between non-preference-based measures and preference-based measures. Many researchers working with health data are mostly interested in measuring symptoms, and metrics which do (‘only’) this would be examples of non-preference-based measures. Non-preference based measures can again be subdivided into disease- and symptom-specific measures, and non-disease-specific/generic measures; an example of the latter would be the SF-36, ‘the most widely used and best-known example of a generic or non-disease-specific measure of general health’.
Economists will often want to put a value on symptoms or quality-of-life states, and in order to do this you need to work with preference-based measures – there are a lot of limitations one confronts when dealing with non-preference-based measures. Non-preference based measures tend for example to be very different in design and purpose (because asthma is not the same thing as, say, bulimia), which means that there is often a lack of comparability across measures. It is also difficult to know how to properly trade off various dimensions included when using such metrics (for example pain relief can be the result of a drug which also increases nausea, and it’s not perfectly clear when you use such measures whether such a change is to be considered desirable or not); similar problems occur when taking the time dimension into account, where problems with aggregation over time and how to deal with this pop up. Various problems related to weighting are recurring problems; for example a question can be asked when using such measures which symptoms/dimensions included are more important? Are they all equally important? This goes for both the weighting of various different domains included in the metric, and for how to weigh individual questions within a given domain. Many non-preference-based measures contain an implicit equal-interval assumption, so that a move from (e.g.) level one to level two on the metric (e.g. from ‘no pain at all’ to ‘a little’) is considered the same as a move from (e.g.) level three to level four (e.g. ‘quite a bit’ to ‘very much’), and it’s not actually clear that the people who supply the information that goes into these metrics would consider such an approach to be a correct reflection of how they perceive these things. Conceptually related to the aggregation problem mentioned above is the problem that people may have different attitudes toward short-term and long-term health effects/outcomes, but non-preference-based measures usually give equal weight to a health state regardless of the timing of the health state. The issue of some patients dying is not addressed at all when using these measures, as they do not contain information about mortality; which may be an important variable. For all these reasons the authors argue in the text that:
“In summary, non-preference-based health status measures, whether disease specific or generic, are not suitable as outcome measures in economic evaluation. Instead, economists require a measure that combines quality and quantity of life, and that also incorporates the valuations that individuals place on particular states of health.
The outcome metric that is currently favoured as meeting these requirements and facilitating the widest possible comparison between alternative uses of health resources is the quality-adjusted life year“.
Non-preference-based tools may be useful, but you will usually need to go ‘further’ than those to be able to handle the problems economists will tend to care the most about. Some more observations from the chapter below:
“the most important challenge [when valuing health states] is to find a reliable way of quantifying the quality of life associated with any particular health state. There are two elements to this: describing the health state, which […] could be either a disease-specific description or a generic description intended to cover many different diseases, and placing a valuation on the health state. […] these weights or valuations are related to utility theory and are frequently referred to as utilities or utility values.
Obtaining utility values almost invariably involves some process by which individuals are given descriptions of a number of health states and then directly or indirectly express their preferences for these states. It is relatively simple to measure ordinal preferences by asking respondents to rank-order different health states. However, these give no information on strength of preference and a simple ranking suffers from the equal interval assumption […]; as a result they are not suitable for economic evaluation. Instead, analysts make use of cardinal preference measurement. Three main methods have been used to obtain cardinal measures of health state preferences: the rating scale, the time trade-off, and the standard gamble. […] The large differences typically observed between RS [rating scale] and TTO [time trade-off] or SG [standard gamble] valuations, and the fact that the TTO and SG methods are choice based and therefore have stronger foundations in decision theory, have led most standard texts and guidelines for technology appraisal to recommend choice-based valuation methods [The methods are briefly described here, where the ‘VAS’ corresponds to the rating scale method mentioned – the book covers the methods in much more detail, but I won’t go into those details here].”
“Controversies over health state valuation are not confined to the valuation method; there are also several strands of opinion concerning who should provide valuations. In principle, valuations could be provided by patients who have had first-hand experience of the health state in question, or by experts such as clinicians with relevant scientific or clinical expertise, or by members of the public. […] there is good evidence that the valuations made by population samples and patients frequently vary quite substantially [and] the direction of difference is not always consistent. […] current practice has moved towards the use of valuations obtained from the general public […], an approach endorsed by recent guidelines in the UK and USA explicitly recommend that population valuations are used”.
Given the very large number of studies which have been based on non-preference based instruments, it would be desirable for economists working in this field to somehow ‘translate’ the information contained in those studies so that this information can also be used for cost-effectiveness evaluations. As a result of this an increasing number of so-called ‘mapping studies’ have been conducted over the years, the desired goal of which is to translate the non-preference based measures into health state utilities, allowing outcomes and effects derived from the studies to be expressed in terms of QALYs. There’s more than one way to try to get from a non-preference based metric to a preference-based metric and the authors describe three approaches in some detail, though I’ll not discuss those approaches or details here. They make this concluding assessment of mapping studies in the text:
“Mapping studies are continuing to proliferate, and the literature on new mapping algorithms and methods, and comparisons between approaches, is expanding rapidly. In general, mapping methods seem to have reasonable ability to predict group mean utility scores and to differentiate between groups with or without known existing illness. However, they all seem to predict increasingly poorly as health states become more serious. […] all forms of mapping are ‘second best’, and the existence of a range of techniques should not be taken as an argument for relying on mapping instead of obtaining direct preference-based measurements in prospectively designed studies.”
I won’t talk too much about the chapter on how to define, measure and value costs, but I felt that a few observations from the chapter should be included in the coverage:
“When asking patients to complete resource/time questionnaires (or answer interview questions), a particularly important issue is deciding on the optimum recall period. Two types of recall error can be distinguished: simply forgetting an entire episode, or incorrectly recalling when it occurred. […] there is a trade-off between recall bias and complete sampling information. […] the longer the period of recall the greater is the likelihood of recall error, but the shorter the recall period the greater is the problem of missing information.”
“The range of patient-related costs included in economic valuations can vary considerably. Some studies include only the costs incurred by patients in travelling to a hospital or clinic for treatment; others may include a wider range of costs including over-the-counter purchases of medications or equipment. However, in some studies a much broader approach is taken, in which attempts are made to capture both the costs associated with treatments and the consequences of illness in terms of absence from or cessation of work.”
An important note here which I thought I should add is that whereas many people unfamiliar with this field may translate ‘medical costs of illness’ with ‘the money that is paid to the doctor(s)’, direct medical costs will in many cases drastically underestimate the ‘true costs’ of disease. To give an example, Ferber et al. (2006) when looking at the costs of diabetes included two indirect cost components in their analysis – inability to work, and early retirement – and concluded that these two cost components made up approximately half of the total costs of diabetes. I think there are reasons to be skeptical of the specific estimate on account of the way it is made (for example if diabetics are less productive/earn less than the population in general, which seems likely if the disease is severe enough to cause many people to withdraw prematurely from the labour market, the cost estimate may be argued to be an overestimate), but on the other hand there are multiple other potentially important indirect cost components they do not include in the calculation, such as e.g. disease-related lower productivity while at work (for details on this, see e.g. this paper – that cost component may also be substantial in some contexts) and things like spousal employment spill-over effects (it is known from related research – for an example, see this PhD dissertation – that disease may impact on the retirement decisions of the spouse of the individual who is sick, not just the individual itself, but effects here are likely to be highly context-dependent and to vary across countries). Another potentially important variable in an indirect cost context is informal care provision. Here’s what they authors say about that one:
“Informal care is often provided by family members, friends, and volunteers. Devoting time and resources to collecting this information may not be worthwhile for interventions where informal care costs are likely to form a very small part of the total costs. However, in other studies non-health-service costs could represent a substantial part of the total costs. For instance, dementia is a disease where the burden of care is likely to fall upon other care agencies and family members rather than entirely on the health and social care services, in which case considering such costs would be important.
To date [however], most economic evaluations have not considered informal care costs.”
Yesterday’s SMBC was awesome, and I couldn’t help myself from including it here (click to view full size):
In a way the three words I chose to omit from the post title are rather important in order to know which kind of book this is – the full title of Gray et al.’s work is: Applied Methods of … – but as I won’t be talking much about the ‘applied’ part in my coverage here, focusing instead on broader principles etc. which will be easier for people without a background in economics to follow, I figured I might as well omit those words from the post titles. I should also admit that I personally did not spend much time on the exercises, as this did not seem necessary in view of what I was using the book for. Despite not having spent much time on the exercises myself, I incidentally did reward the authors for including occasionally quite detailed coverage of technical aspects in my rating of the book on goodreads; I feel confident from the coverage that if I need to apply some of the methods they talk about in the book later on, the book will do a good job of helping me get things right. All in all, the book’s coverage made it hard for me not to give it 5 stars – so that was what I did.
I own an actual physical copy of the book, which makes blogging it more difficult than usual; I prefer blogging e-books. The greater amount of work involved in covering physical books is also one reason why I have yet to talk about Eysenck & Keane’s Cognitive Psychology text here on the blog, despite having read more than 500 pages of that book (it’s not that the book is boring). My coverage of the contents of both this book and the Eysenck & Keane book will (assuming I ever get around to blogging the latter, that is) be less detailed than it could have been, but on the other hand it’ll likely be very focused on key points and observations from the coverage.
I have talked about cost-effectiveness before here on the blog, e.g. here, but in my coverage of the book below I have not tried to avoid making points or including observations which I’ve already made elsewhere on the blog; it’s too much work to keep track of such things. With those introductory remarks out of the way, let’s move on to some observations made in the book:
“In cost-effectiveness analysis we first calculate the costs and effects of an intervention and one or more alternatives, then calculate the differences in cost and differences in effect, and finally present these differences in the form of a ratio, i.e. the cost per unit of health outcome effect […]. Because the focus is on differences between two (or more) options or treatments, analysts typically refer to incremental costs, incremental effects, and the incremental cost-effectiveness ratio (ICER). Thus, if we have two options a and b, we calculate their respective costs and effects, then calculate the difference in costs and difference in effects, and then calculate the ICER as the difference in costs divided by the difference in effects […] cost-effectiveness analyses which measure outcomes in terms of QALYs are sometimes referred to as cost-utility studies […] but are sometimes simply considered as a subset of cost-effectiveness analysis.”
“Cost-effectiveness analysis places no monetary value on the health outcomes it is comparing. It does not measure or attempt to measure the underlying worth or value to society of gaining additional QALYs, for example, but simply indicates which options will permit more QALYs to be gained than others with the same resources, assuming that gaining QALYs is agreed to be a reasonable objective for the health care system. Therefore the cost-effectiveness approach will never provide a way of determining how much in total it is worth spending on health care and the pursuit of QALYs rather than on other social objectives such as education, defence, or private consumption. It does not permit us to say whether health care spending is too high or too low, but rather confines itself to the question of how any given level of spending can be arranged to maximize the health outcomes yielded.
In contrast, cost-benefit analysis (CBA) does attempt to place some monetary valuation on health outcomes as well as on health care resources. […] The reasons for the more widespread use of cost-effectiveness analysis compared with cost-benefit analysis in health care are discussed extensively elsewhere, […] but two main issues can be identified. Firstly, significant conceptual or practical problems have been encountered with the two principal methods of obtaining monetary valuations of life or quality of life: the human capital approach […] and the willingness to pay approach […] Second, within the health care sector there remains a widespread and intrinsic aversion to the concept of placing explicit monetary values on health or life. […] The cost-benefit approach should […], in principle, permit broad questions of allocative efficiency to be addressed. […] In contrast, cost-effectiveness analysis can address questions of productive or production efficiency, where a specified good or service is being produced at the lowest possible cost – in this context, health gain using the health care budget.”
“when working in the two-dimensional world of cost-effectiveness analysis, there are two uncertainties that will be encountered. Firstly, there will be uncertainty concerning the location of the intervention on the cost-effectiveness plane: how much more or less effective and how much more or less costly it is than current treatment. Second, there is uncertainty concerning how much the decision-maker is willing to pay for health gain […] these two uncertainties can be presented together in the form of the question ‘What is the probability that this intervention is cost-effective?’, a question which effectively divides our cost-effectiveness plane into just two policy spaces – below the maximum acceptable line, and above it”.
“Conventionally, cost-effectiveness ratios that have been calculated against a baseline or do-nothing option without reference to any alternatives are referred to as average cost-effectiveness ratios, while comparisons with the next best alternative are described as incremental cost-effectiveness ratios […] it is quite misleading to calculate average cost-effectiveness ratios, as they ignore the alternatives available.”
“A life table provides a method of summarizing the mortality experience of a group of individuals. […] There are two main types of life table. First, there is a cohort life table, which is constructed based on the mortality experience of a group of individuals […]. While this approach can be used to characterize life expectancies of insects and some animals, human longevity makes this approach difficult to apply as the observation period would have to be sufficiently long to be able to observe the death of all members of the cohort. Instead, current life tables are normally constructed using cross-sectional data of observed mortality rates at different ages at a given point in time […] Life tables can also be classified according to the intervals over which changes in mortality occur. A complete life table displays the various rates for each year of life; while an abridged life table deals with greater periods of time, for example 5 year age intervals […] A life table can be used to generate a survival curve S(x) for the population at any point in time. This represents the probability of surviving beyond a certain age x (i.e. S(x)=Pr[X>x]). […] The chance of a male living to the age of 60 years is high (around 0.9) [in the UK, presumably – US] and so the survival curve is comparatively flat up until this age. The proportion dying each year from the age of 60 years rapidly increases, so the curve has a much steeper downward slope. In the last part of the survival curve there is an inflection, indicating a slowing rate of increase in the proportion dying each year among the very old (over 90 years). […] The hazard rate is the slope of the survival curve at any point, given the instantaneous chance of an individual dying.”
“Life tables are a useful tool for estimating changes in life expectancies from interventions that reduce mortality. […] Multiple-cause life tables are a way of quantifying outcomes when there is more than one mutually exclusive cause of death. These life tables can estimate the potential gains from the elimination of a cause of death and are also useful in calculating the benefits of interventions that reduce the risk of a particular cause of death. […] One issue that arises when death is divided into multiple causes in this type of life table is competing risk. […] competing risk can arise ‘when an individual can experience more than one type of event and the occurrence of one type of event hinders the occurrence of other types of events’. Competing risks affect life tables, as those who die from a specific cause have no chance of dying from other causes during the remainder of the interval […]. In practice this will mean that as soon as one cause is eliminated the probabilities of dying of other causes increase […]. Several methods have been proposed to correct for competing risks when calculating life tables.”
“the use of published life-table methods may have limitations, especially when considering particular populations which may have very different risks from the general population. In these cases, there are a host of techniques referred to as survival analysis which enables risks to be estimated from patient-level data. […] Survival analysis typically involves observing one or more outcomes in a population of interest over a period of time. The outcome, which is often referred to as an event or endpoint could be death, a non-fatal outcome such as a major clinical event (e.g. myocardial infarction), the occurrence of an adverse event, or even the date of first non-compliance with a therapy.”
“A key feature of survival data is censoring, which occurs whenever the event of interest is not observed within the follow-up period. This does not mean that the event will not occur some time in the future, just that it has not occurred while the individual was observed. […] The most common case of censoring is referred to as right censoring. This occurs whenever the observation of interest occurs after the observation period. […] An alternative form of censoring is left censoring, which occurs when there is a period of time when the individuals are at risk prior to the observation period.
A key feature of most survival analysis methods is that they assume that the censoring process is non-informative, meaning that there is no dependence between the time to the event of interest and the process that is causing the censoring. However, if the duration of observation is related to the severity of a patient’s disease, for example if patients with more advanced illness are withdrawn early from the study, the censoring is likely to be informative and other techniques are required”.
“Differences in the composition of the intervention and control groups at the end of follow-up may have important implications for estimating outcomes, especially when we are interested in extrapolation. If we know that the intervention group is older and has a lower proportion of females, we would expect these characteristics to increase the hazard mortality in this group over their remaining lifetimes. However, if the intervention group has experienced a lower number of events, this may significantly reduce the hazard for some individuals. They may also benefit from a past treatment which continues to reduce the hazard of a primary outcome such as death. This effect […] is known as the legacy effect“.
“Changes in life expectancy are a commonly used outcome measure in economic evaluation. […] Table 4.6 shows selected examples of estimates of the gain in life expectancy for various interventions reported by Wright and Weinstein (1998) […] Gains in life expectancy from preventative interventions in populations of average risk generally ranged from a few days to slightly more than a year. […] The gains in life expectancy from preventing or treating disease in persons at elevated risk [this type of prevention is known as ‘secondary-‘ and/or ‘tertiary prevention’ (depending on the circumstances), as opposed to ‘primary prevention’ – the distinction between primary prevention and more targeted approaches is often important in public health contexts, because the level of targeting will often interact with the cost-effectiveness dimension – US] are generally greater […one reason why this does not necessarily mean that targeted approaches are always better is that search costs will often be an increasing function of the level of targeting – US]. Interventions that treat established disease vary, with gains in life-expectancy ranging from a few months […] to as long as nine years […] the point that Wright and Weinstein (1998) were making was not that absolute gains vary, but that a gain in life expectancy of a month from a preventive intervention targeted at population at average risk and a gain of a year from a preventive intervention targeted at populations at elevated risk could both be considered large. It should also be noted that interventions that produce a comparatively small gain in life expectancy when averaged across the population […] may still be very cost-effective.”
I haven’t really blogged this book in anywhere near the amount of detail it deserves even though my first post about the book actually had a few quotes illustrating how much different stuff is covered in the book.
This book is technical, and even if I’m trying to make it less technical by omitting the math in this post it may be a good idea to reread the first post about the book before reading this post to refresh your knowledge of these things.
Quotes and comments below – most of the coverage here focuses on stuff covered in chapters 3 and 4 in the book.
“Tests of null hypotheses and information-theoretic approaches should not be used together; they are very different analysis paradigms. A very common mistake seen in the applied literature is to use AIC to rank the candidate models and then “test” to see whether the best model (the alternative hypothesis) is “significantly better” than the second-best model (the null hypothesis). This procedure is flawed, and we strongly recommend against it […] the primary emphasis should be on the size of the treatment effects and their precision; too often we find a statement regarding “significance,” while the treatment and control means are not even presented. Nearly all statisticians are calling for estimates of effect size and associated precision, rather than test statistics, P-values, and “significance.” [Borenstein & Hedges certainly did as well in their book (written much later), and this was not an issue I omitted to talk about in my coverage of their book…] […] Information-theoretic criteria such as AIC, AICc, and QAICc are not a “test” in any sense, and there are no associated concepts such as test power or P-values or α-levels. Statistical hypothesis testing represents a very different, and generally inferior, paradigm for the analysis of data in complex settings. It seems best to avoid use of the word “significant” in reporting research results under an information-theoretic paradigm. […] AIC allows a ranking of models and the identification of models that are nearly equally useful versus those that are clearly poor explanations for the data at hand […]. Hypothesis testing provides no general way to rank models, even for models that are nested. […] In general, we recommend strongly against the use of null hypothesis testing in model selection.”
“The bootstrap is a type of Monte Carlo method used frequently in applied statistics. This computer-intensive approach is based on resampling of the observed data […] The fundamental idea of the model-based sampling theory approach to statistical inference is that the data arise as a sample from some conceptual probability distribution f. Uncertainties of our inferences can be measured if we can estimate f. The bootstrap method allows the computation of measures of our inference uncertainty by having a simple empirical estimate of f and sampling from this estimated distribution. In practical application, the empirical bootstrap means using some form of resampling with replacement from the actual data x to generate B (e.g., B = 1,000 or 10,000) bootstrap samples […] The set of B bootstrap samples is a proxy for a set of B independent real samples from f (in reality we have only one actual sample of data). Properties expected from replicate real samples are inferred from the bootstrap samples by analyzing each bootstrap sample exactly as we first analyzed the real data sample. From the set of results of sample size B we measure our inference uncertainties from sample to (conceptual) population […] For many applications it has been theoretically shown […] that the bootstrap can work well for large sample sizes (n), but it is not generally reliable for small n […], regardless of how many bootstrap samples B are used. […] Just as the analysis of a single data set can have many objectives, the bootstrap can be used to provide insight into a host of questions. For example, for each bootstrap sample one could compute and store the conditional variance–covariance matrix, goodness-of-fit values, the estimated variance inflation factor, the model selected, confidence interval width, and other quantities. Inference can be made concerning these quantities, based on summaries over the B bootstrap samples.”
“Information criteria attempt only to select the best model from the candidate models available; if a better model exists, but is not offered as a candidate, then the information-theoretic approach cannot be expected to identify this new model. Adjusted R2 […] are useful as a measure of the proportion of the variation “explained,” [but] are not useful in model selection […] adjusted R2 is poor in model selection; its usefulness should be restricted to description.”
“As we have struggled to understand the larger issues, it has become clear to us that inference based on only a single best model is often relatively poor for a wide variety of substantive reasons. Instead, we increasingly favor multimodel inference: procedures to allow formal statistical inference from all the models in the set. […] Such multimodel inference includes model averaging, incorporating model selection uncertainty into estimates of precision, confidence sets on models, and simple ways to assess the relative importance of variables.”
“If sample size is small, one must realize that relatively little information is probably contained in the data (unless the effect size if very substantial), and the data may provide few insights of much interest or use. Researchers routinely err by building models that are far too complex for the (often meager) data at hand. They do not realize how little structure can be reliably supported by small amounts of data that are typically “noisy.””
“Sometimes, the selected model [when applying an information criterion] contains a parameter that is constant over time, or areas, or age classes […]. This result should not imply that there is no variation in this parameter, rather that parsimony and its bias/variance tradeoff finds the actual variation in the parameter to be relatively small in relation to the information contained in the sample data. It “costs” too much in lost precision to add estimates of all of the individual θi. As the sample size increases, then at some point a model with estimates of the individual parameters would likely be favored. Just because a parsimonious model contains a parameter that is constant across strata does not mean that there is no variation in that process across the strata.”
“[In a significance testing context,] a significant test result does not relate directly to the issue of what approximating model is best to use for inference. One model selection strategy that has often been used in the past is to do likelihood ratio tests of each structural factor […] and then use a model with all the factors that were “significant” at, say, α = 0.05. However, there is no theory that would suggest that this strategy would lead to a model with good inferential properties (i.e., small bias, good precision, and achieved confidence interval coverage at the nominal level). […] The purpose of the analysis of empirical data is not to find the “true model”— not at all. Instead, we wish to find a best approximating model, based on the data, and then develop statistical inferences from this model. […] We search […] not for a “true model,” but rather for a parsimonious model giving an accurate approximation to the interpretable information in the data at hand. Data analysis involves the question, “What level of model complexity will the data support?” and both under- and overfitting are to be avoided. Larger data sets tend to support more complex models, and the selection of the size of the model represents a tradeoff between bias and variance.”
“The easy part of the information-theoretic approaches includes both the computational aspects and the clear understanding of these results […]. The hard part, and the one where training has been so poor, is the a priori thinking about the science of the matter before data analysis — even before data collection. It has been too easy to collect data on a large number of variables in the hope that a fast computer and sophisticated software will sort out the important things — the “significant” ones […]. Instead, a major effort should be mounted to understand the nature of the problem by critical examination of the literature, talking with others working on the general problem, and thinking deeply about alternative hypotheses. Rather than “test” dozens of trivial matters (is the correlation zero? is the effect of the lead treatment zero? are ravens pink?, Anderson et al. 2000), there must be a more concerted effort to provide evidence on meaningful questions that are important to a discipline. This is the critical point: the common failure to address important science questions in a fully competent fashion. […] “Let the computer find out” is a poor strategy for researchers who do not bother to think clearly about the problem of interest and its scientific setting. The sterile analysis of “just the numbers” will continue to be a poor strategy for progress in the sciences.
Researchers often resort to using a computer program that will examine all possible models and variables automatically. Here, the hope is that the computer will discover the important variables and relationships […] The primary mistake here is a common one: the failure to posit a small set of a priori models, each representing a plausible research hypothesis.”
“Model selection is most often thought of as a way to select just the best model, then inference is conditional on that model. However, information-theoretic approaches are more general than this simplistic concept of model selection. Given a set of models, specified independently of the sample data, we can make formal inferences based on the entire set of models. […] Part of multimodel inference includes ranking the fitted models from best to worst […] and then scaling to obtain the relative plausibility of each fitted model (gi) by a weight of evidence (wi) relative to the selected best model. Using the conditional sampling variance […] from each model and the Akaike weights […], unconditional inferences about precision can be made over the entire set of models. Model-averaged parameter estimates and estimates of unconditional sampling variances can be easily computed. Model selection uncertainty is a substantial subject in its own right, well beyond just the issue of determining the best model.”
“There are three general approaches to assessing model selection uncertainty: (1) theoretical studies, mostly using Monte Carlo simulation methods; (2) the bootstrap applied to a given set of data; and (3) utilizing the set of AIC differences (i.e., ∆i) and model weights wi from the set of models fit to data.”
“Statistical science should emphasize estimation of parameters and associated measures of estimator uncertainty. Given a correct model […], an MLE is reliable, and we can compute a reliable estimate of its sampling variance and a reliable confidence interval […]. If the model is selected entirely independently of the data at hand, and is a good approximating model, and if n is large, then the estimated sampling variance is essentially unbiased, and any appropriate confidence interval will essentially achieve its nominal coverage. This would be the case if we used only one model, decided on a priori, and it was a good model, g, of the data generated under truth, f. However, even when we do objective, data-based model selection (which we are advocating here), the [model] selection process is expected to introduce an added component of sampling uncertainty into any estimated parameter; hence classical theoretical sampling variances are too small: They are conditional on the model and do not reflect model selection uncertainty. One result is that conditional confidence intervals can be expected to have less than nominal coverage.”
“Data analysis is sometimes focused on the variables to include versus exclude in the selected model (e.g., important vs. unimportant). Variable selection is often the focus of model selection for linear or logistic regression models. Often, an investigator uses stepwise analysis to arrive at a final model, and from this a conclusion is drawn that the variables in this model are important, whereas the other variables are not important. While common, this is poor practice and, among other issues, fails to fully consider model selection uncertainty. […] Estimates of the relative importance of predictor variables xj can best be made by summing the Akaike weights across all the models in the set where variable j occurs. Thus, the relative importance of variable j is reflected in the sum w+ (j). The larger the w+ (j) the more important variable j is, relative to the other variables. Using the w+ (j), all the variables can be ranked in their importance. […] This idea extends to subsets of variables. For example, we can judge the importance of a pair of variables, as a pair, by the sum of the Akaike weights of all models that include the pair of variables. […] To summarize, in many contexts the AIC selected best model will include some variables and exclude others. Yet this inclusion or exclusion by itself does not distinguish differential evidence for the importance of a variable in the model. The model weights […] summed over all models that include a given variable provide a better weight of evidence for the importance of that variable in the context of the set of models considered.” [The reason why I’m not telling you how to calculate Akaike weights is that I don’t want to bother with math formulas in wordpress – but I guess all you need to know is that these are not hard to calculate. It should perhaps be added that one can also use bootstrapping methods to obtain relevant model weights to apply in a multimodel inference context.]
“If data analysis relies on model selection, then inferences should acknowledge model selection uncertainty. If the goal is to get the best estimates of a set of parameters in common to all models (this includes prediction), model averaging is recommended. If the models have definite, and differing, interpretations as regards understanding relationships among variables, and it is such understanding that is sought, then one wants to identify the best model and make inferences based on that model. […] The bootstrap provides direct, robust estimates of model selection probabilities πi , but we have no reason now to think that use of bootstrap estimates of model selection probabilities rather than use of the Akaike weights will lead to superior unconditional sampling variances or model-averaged parameter estimators. […] Be mindful of possible model redundancy. A carefully thought-out set of a priori models should eliminate model redundancy problems and is a central part of a sound strategy for obtaining reliable inferences. […] Results are sensitive to having demonstrably poor models in the set of models considered; thus it is very important to exclude models that are a priori poor. […] The importance of a small number (R) of candidate models, defined prior to detailed analysis of the data, cannot be overstated. […] One should have R much smaller than n. MMI [Multi-Model Inference] approaches become increasingly important in cases where there are many models to consider.”
“In general there is a substantial amount of model selection uncertainty in many practical problems […]. Such uncertainty about what model structure (and associated parameter values) is the K-L [Kullback–Leibler] best approximating model applies whether one uses hypothesis testing, information-theoretic criteria, dimension-consistent criteria, cross-validation, or various Bayesian methods. Often, there is a nonnegligible variance component for estimated parameters (this includes prediction) due to uncertainty about what model to use, and this component should be included in estimates of precision. […] we recommend assessing model selection uncertainty rather than ignoring the matter. […] It is […] not a sound idea to pick a single model and unquestioningly base extrapolated predictions on it when there is model uncertainty.”
“In this book we present several novel concepts in cooperative game theory, but from a computer scientist’s point of view. Especially, we will look at a type of games called non-transferable utility games. […] In this book, we extend the classic stability concept of the non-transferable utility core by proposing new belief-based stability criteria under uncertainty, and illustrate how the new concept can be used to analyse the stability of a new type of belief-based coalition formation game. Mechanisms for reaching solutions of the new stable criteria are proposed and some real life application examples are studied. […] In Chapter 1, we first provide an introduction of topics in game theory that are relevant to the concepts discussed in this book. In Chapter 2, we review some relevant works from the literature, especially in cooperative game theory and multi-agent coalition formation problems. In Chapter 3, we discuss the effect of uncertainty in the agent’s beliefs on the stability of the games. A rule-based approach is adopted and the concepts of strong core and weak core are introduced. We also discuss the effect of precision of the beliefs on the stability of the coalitions. In Chapter 4, we introduce private beliefs in non-transferable utility (NTU) games, so that the preferences of the agents are no longer common knowledge. The impact of belief accuracy on stability is also examined. In Chapter 5, we study an application of the proposed belief-based stability concept, namely the buyer coalition problem, and we see how the proposed concept can be used in the evaluation of this multi-agent coalition formation problem. In Chapter 6, we combine the works of earlier chapters and produce a complete picture of the introduced concepts: non-transferable utility games with private beliefs and uncertainty. We conclude this book in Chapter 7.”
The above quote is from the preface of the book, which I finished yesterday. It deals with some issues I was slightly annoyed about not being covered in a previous micro course; my main problem being that it seemed to me back then that the question of belief accuracy and the role of this variable was not properly addressed in the models we looked at (‘people can have mistaken beliefs, and it seems obvious that the ways in which they’re wrong can affect which solutions are eventually reached’). The book makes the point that if you look at coalition formation in a context where it is not reasonable to assume that information is shared among coalition partners (because it is in the interest of the participants to keep their information/preferences/willingness to pay private), then the beliefs of the potential coalition partners may play a major role in determining which coalitions are feasible and which are ruled out. A key point is that in the model context explored by the authors, inaccurate beliefs of agents will expand the number of potential coalitions which are available, although coalition options ruled out by accurate beliefs are less stable than ones which are not. They do not discuss the fact that this feature is unquestionably a result of implicit assumptions made along the way which may not be true, and that inaccurate beliefs may also in some contexts conceivably lead to lower solution support in general (e.g. through variables such as disagreement, or, to think more in terms of concepts specifically included in their model framework, higher general instability of solutions which can feasibly be reached, making agents less likely to explore the option of participating in coalitions in the first place due to the lower payoffs associated with the available coalitions likely to be reached – dynamics such as these are not included in the coverage). I decided early on to not blog the stuff in this book in major detail because it’s not the kind of book where this makes sense to do (in my opinion), but if you’re curious about how they proceed, they talk quite a bit about the (classical) Core and discuss why this is not an appropriate solution concept to apply in the contexts they explore, and they then proceed to come up with new and better solution criteria, developed with the aid of some new variables and definitions along the way, in order to end up with some better solution concepts, their so-called ‘belief-based cores’, which are perhaps best thought of as extensions of the classical core concept. I should perhaps point out, as this may not be completely clear, that the beliefs they talk about deal both with the ‘state of nature’ (which in part of the coverage is assumed to be basically unobservable) and the preferences of agents involved.
If you want a sort of bigger picture idea of what this book is about, I should point out that in general you have two major sub-fields of game theory, dealing with cooperative and non-cooperative games respectively. Within the sub-field of cooperative games, a distinction is made between games and settings where utilities are transferable, and games/settings where they are not. This book belongs in the latter category; it deals with cooperative games in which utilities are non-transferable. The authors in the beginning make a big deal out of the distinction between whether or not utilities are transferable, and claim that the assumption that they’re not is the more plausible one; whereas they do have a point, I however also actually think the non-transferability assumption might in some of the specific examples included in the book be a borderline questionable assumption. To give an example, the non-transferability assumption seems in one context to imply that all potential coalition partners have the same amount of bargaining power. This assumption is plausible in some contexts, but wildly implausible in others (and I’m not sure the authors would agree with me about which contexts would belong to which category).
The professor teaching the most recent course in micro I took had a background in computer science, rather than economics – he was also Asian, but this perhaps goes without saying. This book is supposedly a computer science book, and they argue in the introduction that: “instead of looking at human beings, we study the problem from an intelligent software agent’s perspective.” However I don’t think a single one of the examples included in the book would be an example you could not also have found in a classic micro text, and it’s really hard to tell in many parts of the coverage that the authors aren’t economists with a background in micro – there seems to be quite a bit of field overlap here (this field overlap incidentally extends to areas of economics besides micro, is my impression; one econometrics TA I had, teaching the programming part of the course, was also a CS major). In the book they talk a bit about coalition formation mechanisms and approaches, such as propose-and-evaluate mechanisms and auction approaches, and they also touch briefly upon stuff like mechanism design. They state in the description that: “The book is intended for graduate students, engineers, and researchers in the field of artificial intelligence and computer science.” I think it’s really weird that they don’t include (micro-)economists as well, because this stuff is obviously quite close to/potentially relevant to the kind of work some of these people are working on.
There are a lot of definitions, theorems, and proofs in this book, and as usual when doing work on game theory you need to think very carefully about the stuff they cover to be able to follow it, but I actually found it reasonably accessible – the book is not terribly difficult to read. Though I would probably advise you against reading the book if you have not at least read an intro text on game theory. Although as already mentioned the book deals with an analytical context in which utilities are non-transferable, it should be pointed out that this assumption is sort of implicit in the coverage, in the sense that the authors don’t really deal with utility functions at all; the book only deals with preference relations, not utility functions, so it probably helps to be familiar with this type of analysis (e.g. by having studied (solved some problems) dealing with the kind of stuff included in the coverage in chapter 1 of Mas-Colell).
Part of the reason why I gave the book only two stars is that the authors are Chinese and their English is terrible. Another reason is that as is usually the case in game theory, these guys spend a lot of time and effort being very careful to define their terms and make correct inferences from the assumptions they make – but they don’t really end up saying very much.
i. Invasion of Poland. I recently realized I had no idea e.g. how long it took for the Germans and Soviets to defeat Poland during WW2 (the answer is 1 month and five days). The Germans attacked more than two weeks before the Soviets did. The article has lots of links, like most articles about such topics on wikipedia. Incidentally the question of why France and Britain applied a double standard and only declared war on Germany, and not the Soviet Union, is discussed in much detail in the links provided by u/OldWorldGlory here.
ii. Huaynaputina. From the article:
“A few days before the eruption, someone reported booming noise from the volcano and fog-like gas being emitted from its crater. The locals scrambled to appease the volcano, preparing girls, pets, and flowers for sacrifice.”
This makes sense – what else would one do in a situation like that? Finding a few virgins, dogs and flowers seems like the sensible approach – yes, you have to love humans and how they always react in sensible ways to such crises.
I’m not really sure the rest of the article is really all that interesting, but I found the above sentence both amusing and depressing enough to link to it here.
iii. Albert Pierrepoint. This guy killed hundreds of people.
On the other hand people were fine with it – it was his job. Well, sort of, this is actually slightly complicated. (“Pierrepoint was often dubbed the Official Executioner, despite there being no such job or title”).
Anyway this article is clearly the story of a guy who achieved his childhood dream – though unlike other children, he did not dream of becoming a fireman or a pilot, but rather of becoming the Official Executioner of the country. I’m currently thinking of using Pierrepoint as the main character in the motivational story I plan to tell my nephew when he’s a bit older.
iv. Second Crusade (featured). Considering how many different ‘states’ and ‘kingdoms’ were involved, a surprisingly small amount of people were actually fighting; the article notes that “[t]here were perhaps 50,000 troops in total” on the Christian side when the attack on Damascus was initiated. It wasn’t enough, as the outcome of the crusade was a decisive Muslim victory in the ‘Holy Land’ (Middle East).
v. 0.999… (featured). This thing is equal to one, but it can sometimes be really hard to get even very smart people to accept this fact. Lots of details and some proofs presented in the article.
vi. Shapley–Folkman lemma (‘good article’ – but also a somewhat technical article).
vii. Multituberculata. This article is not that special, but I add it here also because I think it ought to be and I’m actually sort of angry that it’s not; sometimes the coverage provided on wikipedia simply strikes me as grossly unfair, even if this is perhaps a slightly odd way to think about stuff. As pointed out in the article (Agustí points this out in his book as well), “The multituberculates existed for about 120 million years, and are often considered the most successful, diversified, and long-lasting mammals in natural history.” Yet notice how much (/little) coverage the article provides. Now compare the article with this article, or this.
Here’s my first post about the book. I was disappointed by some of the chapters in the second half of the book and I think a few of them were quite poor. I have been wondering what to cover from the second half, in part because some of the authors seem to proceed as if e.g. the work of these authors does not exist (key quote: Our findings do not support continued widespread efforts to boost self-esteem in the hope that it will by itself foster improved outcomes) – I was thinking this about the authors of the last chapter, on ‘Changing self-esteem through competence and worthiness training’, in particular; their basic argument seems to be that since CWT (Competence and Worthiness Training) has been shown to improve self-esteem, ‘good things will follow’ people who make use of such programs. Never mind the fact that causal pathways between self-esteem and life outcomes are incredibly unclear, never mind that self-esteem is not the relevant outcome measure (and studies with good outcome measures do not exist), and never mind that effect persistence over time is unknown, to take but three of many problems with the research. They argue/conclude in the chapter that CWT is ’empirically validated’, an observation which almost made me laugh. I’m in a way slightly puzzled that whereas doctors contributing to Springer publications and similar are always supposed to disclose conflicts of interest in the publications, no similar demands are made in the context of the psychological literature; these people obviously make money off of these things, and yet they’re the ones evaluating the few poor studies that have been done, often by themselves, while pretending to be unbiased observers with no financial interests in whether the methods are ‘validated’ or not. Oh well.
Although some chapters are poor (‘data-poor and theory rich’, might not be a bad way to describe them – note that the ‘data poor’ part relates both to low amounts of data and the use of data of questionable quality; I’m thinking specifically about the use of measures of ‘implicit self-esteem’ in chapter 6 – the authors seem confused about the pattern of results and seem to have a hard time making sense of them (they seem to keep having to make up new ad-hoc explanations for why ‘this makes sense in context’), but I don’t think the results are necessarily that confusing; the variables probably aren’t measuring what they think they’re measuring, not even close, and the two different types of measures probably aren’t remotely measuring anything similar (I have a really hard time figuring out why anyone would ever think that they do), so it makes good sense that findings are all over the place..), chapter 8, on ‘Self-esteem as an interpersonal signal, was however really great and I thought I should share some observations from that chapter here – I have done this below. Interestingly, people who read the first post about the book would in light of the stuff included in that chapter do well to forget my personal comments in the first post about me having low self-esteem; interpersonal outcomes seem to be likely to be better if you think the people with whom you interact have high self-esteem (there are exceptions, but none of them seem relevant in this context), whether or not that’s actually true. Of course the level of ‘interaction’ going on here on the blog is very low, but even so… (I may be making a similar type of mistake the authors make in the last chapter here, by making unwarranted assumptions, but anyway…).
Before moving on, I should perhaps point out that I just finished the short Springer publication Appointment Planning in Outpatient Clinics and Diagnostic Facilities. I’m not going to blog this book separately as there frankly isn’t enough stuff in there for it to make sense to devote an entire blog post to it, but I thought I might as well add a few remarks here before moving on. The book contains a good introduction to some basic queueing theory, and quite a few important concepts are covered which people working with those kinds of things ought to know about (also, if you’ve ever had discussions about waiting lists and how ‘it’s terrible that people have to wait so long’ and ‘something has to be done‘, the discussion would have had a higher quality if you’d read this book first). Some chapters of the book are quite technical – here are a few illustrative/relevant links dealing with stuff covered in the book: Pollaczek–Khinchine formula, Little’s Law, the Erlang C formula, the Erlang B formula, Laplace–Stieltjes transform. The main thing I took away from this book was that this stuff is a lot more complicated that I’d thought. I’m not sure how much the average nurse would get out of this book, but I’m also not sure how much influence the average nurse has on planning decisions such as those described in this book – little, I hope. Sometimes a book contains a few really important observations and you sort of want to recommend the book based simply on these observations, because a lot of people would benefit from knowing exactly those things; this book is like that, as planners on many different decision-making levels would benefit from knowing the ‘golden rules’ included in section 7.1. When things go wrong due to mismanagement and very long waiting lists develop, it’s obvious that however you look at it, if people had paid more attention to those aspects, this would probably not have happened. An observation which is critical to include in the coverage of a book like this is that it may be quite difficult for an outside observer (e.g. a person visiting a health clinic) to evaluate the optimality of scheduling procedures except in very obvious cases of inefficiently long queues. Especially in the case of excess capacity most outsiders do not know enough to evaluate these systems fairly; what may look like excess capacity to the outsider may well be a necessary buffer included in the planning schedule to keep waiting times from exploding at other points in time, and it’s really hard to tell those apart if you don’t have access to relevant data. Even if you do, things can be, complicated (see the links above).
Okay, back to the self-esteem text – some observations from the second half of the book below…
“low self-esteem is listed as either a diagnostic criterion or associated feature of at least 24 mental disorders in the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV- TR). Low self-esteem and an insufficient ability to experience self-relevant positive emotions such as pride is particularly strongly linked to depression, to such a degree that some even suggest conceptualizing self-esteem and depression as opposing end points of a bipolar continuum […] The phenomenology of low self-esteem – feeling incompetent and unworthy, unfit for life – inevitably translates into experiencing existence as frightening and futile. This turns life for the person lacking in self-esteem into a chronic emergency: that person is psychologically in a constant state of danger, surrounded by a feeling of impending disaster and a sense of helplessness. Suffering from low self-esteem thus involves having one’s consciousness ruled by fear, which sabotages clarity and efficiency (Branden, 1985). The main goal for such a person is to keep the anxieties, insecurities, and self-doubts at bay, at whatever cost that may come. On the other hand, a person with a satisfying degree of self-respect, whose central motivation is not fear, can afford to rejoice in being alive, and view existence as a more exciting than threatening affair.” [from chapter 7, on ‘Existential perspective on self-esteem’ – I didn’t particularly like that chapter and I’m not sure to which extent I agree with the observations included, but I thought I should add the above to illustrate which kind of stuff is also included in the book.]
“Although past research has emphasized how social environments are internalized to shape self-views, researchers are increasingly interested in how self-views are externalized to shape one’s social environment. From the externalized perspective, people will use information about another’s self-esteem as a gauge of that person’s worth […] self-esteem serves a “status-signaling” function that complements the status-tracking function […] From this perspective, self-esteem influences one’s self-presentational behavior, which in turn influences how others view the self. This status-signaling system in humans should work much like the status-signaling models developed in non-human animals [Aureli et al. and Kappeler et al. are examples of places to go if you’re interested in knowing more about this stuff] […] Ultimately, these status signals have important evolutionary outcomes, such as access to mates and consequent reproductive success. In essence, self-esteem signals important status-related information to others in one’s social world. […] the basic notion here is that conveying high (or low) self-esteem provides social information to others.”
“In an effort to understand their social world, people form lay theories about the world around them. These lay theories consist of information about how characteristics covary within individuals […] Research on the status-signaling function of self-esteem […] and on self-esteem stereotypes […] report a consistent positive bias in the impressions formed about high self-esteem individuals and a consistent negative bias about those with low self-esteem. In several studies conducted by Cameron and her colleagues […], when Canadian and American participants were asked to rate how the average person would describe a high self-esteem individual, they universally reported that higher self-esteem people were attractive, intelligent, warm, competent, emotionally stable, extraverted, open to experience, conscientious, and agreeable. Basically, on all characteristics in the rating list, high self-esteem people were described as superior. […] Whereas people sing the praises of high self-esteem, low self-esteem is viewed as a “fatal flaw.” In the same set of studies, Cameron and her colleagues […] found that participants attributed negative characteristics to low self-esteem individuals. Across all of the characteristics assessed, low self-esteem people were seen as inferior. They were described as less attractive, less intelligent, less warm, less competent, less sociable, and so forth. The only time that the stereotypes of low self-esteem individuals were rated as “more” than the group of high self-esteem individuals was on negative characteristics, such as experiencing more negative moods and possessing more interpersonally disadvantageous characteristics (e.g., jealousy). […] low self-esteem individuals were seen just as negatively as welfare recipients and mentally ill people on most characteristics […] All cultures do not view self-esteem in the same way. […] There is some evidence to suggest that East Asian cultures link high self-esteem with more negative qualities”
“Zeigler-Hill and his colleagues […] presented participants with a single target, identified as low self-esteem or high self-esteem, and asked for their evaluations of the target. Whether the target was identified as low self-esteem by an explicit label (Study 3), a self-deprecating slogan on a T-shirt (Study 4), or their email address (Study 5, e.g., sadeyes@), participants rated an opposite-sex low self-esteem target as less romantically desirable than a high self-esteem target […]. However, ascribing negative characteristics to low self-esteem individuals is not just limited to decisions about an opposite-sex target. Zeigler-Hill and colleagues demonstrated that, regardless of match or mismatch of perceiver-target gender, when people thought a target had lower self-esteem they were more likely to ascribe negative traits to him or her, such as being lower in conscientiousness […] Overall, people are apt to assume that people with low self-esteem possess negative characteristics, whereas those with high self-esteem possess positive characteristics. Such assumptions are made at the group level […] and at the individual level […] According to Cameron and colleagues […], fewer than 1% of the sample ascribed any positive characteristics to people with low self-esteem when asked to give open-ended descriptions. Furthermore, on the overwhelming majority of characteristics assessed, low self-esteem individuals were rated more negatively than high self-esteem individuals”
“Although for the most part it is low self-esteem that people associate with negative qualities, there is a dark side to being labeled as having high self-esteem. People who are believed to have high self-esteem are seen as more narcissistic […], self-absorbed, and egotistical […] than those believed to possess low self-esteem. Moreover, the benefits of being seen as high self-esteem may be moderated by gender. When rating an opposite-sex target, men were often more positive toward female targets with moderate self-esteem than those with high self-esteem”
“Not only might perceptions of others’ self-esteem influence interactions among relative strangers, but they may also be particularly important in close relationships. Ample evidence demonstrates that a friend or partner’s self-esteem can have actual relational consequences […]. Relationships involving low self-esteem people tend to be less satisfying and less committed […], due at least in part to low self-esteem people’s tendency to engage in defensive, self-protective behavior and their enhanced expectations of rejection […]. Mounting evidence suggests that people can intuit these disadvantages, and thus use self-esteem as an interpersonal signal. […] Research by MacGregor and Holmes (2007) suggests that people expect to be less satisfied in a romantic relationship with a low self-esteem partner than a high self-esteem partner, directly blaming low self-esteem individuals for relationship mishaps […] it appears that people use self-esteem as a signal to indicate desirability as a mate: People report themselves as less likely to date or have sex with those explicitly labeled as having “low self-esteem” compared to those labeled as having “high self-esteem” […] Even when considering friendships, low self-esteem individuals are rated less socially appealing […] In general, it appears that low self-esteem individuals are viewed as less-than-ideal relationship partners.”
“Despite people’s explicit aversion to forming social bonds with low self-esteem individuals, those with low self-esteem do form close relationships. Nevertheless, even these established relationships may suffer when one person detects another’s low self-esteem. For example, people believe that interactions with low self-esteem friends or family members are more exhausting and require more work than interactions with high self-esteem friends and family […]. In the context of romantic relationships, Lemay and Dudley’s (2011) findings confirm the notion that relationships with low self-esteem individuals require extra relationship maintenance (or “work”) as people attempt to “regulate” their romantic partner’s insecurities. Specifically, participants who detected their partner’s low self-esteem tended to exaggerate affection for their partner and conceal negative sentiments, likely in an effort to maintain harmony in their relationship. Unfortunately, this inauthenticity was actually associated with decreased relationship satisfaction for the regulating partner over time. […] MacGregor and colleagues […] have explored a different type of communication in close relationships. Their focus was on capitalization, which is the disclosure of positive personal experiences to others […]. In two experiments […], participants who were led to believe that their close other had low self-esteem capitalized less positively (i.e., enthusiastically) compared to control participants. […] Moreover, in a study involving friend dyads, participants reported capitalizing less frequently with their friend to the extent they perceived him or her as having low self-esteem […] low self-esteem individuals are actually no less responsive to others’ capitalization attempts than are high self-esteem partners. Despite this fact, MacGregor and Holmes (2011) found that people are reluctant to capitalize with low self-esteem individuals precisely because they expect them to be less responsive than high self-esteem partners. Thus people appear to be holding back from low self-esteem individuals unnecessarily. Nevertheless, the consequences may be very real given that capitalization is a process associated with personal and interpersonal benefits”
“Cameron (2010) asked participants to indicate how much they tried to conceal or reveal their self-feelings and insecurities with significant others (best friends, romantic partners, and parents). Those with lower self-esteem reported attempting to conceal their insecurities and self-doubts to a greater degree than those with higher self-esteem. Thus, even in close relationships, low self-esteem individuals appear to see the benefit of hiding their self-esteem. Cameron, Hole, and Cornelius (2012) further investigated whether concealing self-esteem was linked with relational benefits for those with low self-esteem. In several studies, participants were asked to report their own self-esteem and then to provide their “self-esteem image”, or what level of self-esteem they thought they had conveyed to their significant others. Participants then indicated their relationship quality (e.g., satisfaction, commitment, trust). Across all studies and across all relationship types studied (friends, romantic partners, and parents), people reporting a higher self-esteem image, regardless of their own self-esteem level, reported greater relationship quality. […] both low and high self-esteem individuals benefit from believing that a high self-esteem image has been conveyed, though this experience may feel “inauthentic” for low self-esteem people. […] both low and high self-esteem individuals may hope to been seen as they truly are by their close others. […] In a recent meta-analysis, Kwang and Swann (2010) proposed that individuals desire verification unless there is a high risk for rejection. Thus, those with negative self-views may desire to be viewed positively, but only if being seen negatively jeopardizes their relationship. From this perspective, romantic partners should signal high self-esteem during courtship, job applicants should signal high self-esteem to potential bosses, and politicians should signal high self-esteem to their voters. Once the relationship has been cemented (and the potential for rejection has been reduced), however, people should desire to be seen as they are. Importantly, the results of the meta-analysis supported this proposal. While this boundary condition has shed some light on this debate, more research is needed to understand fully under what contexts people are motivated to communicate either positive or negative self-views.”
“it appears that people’s judgments of others’ self-esteem are partly well informed, yet also based on inaccurate stereotypes about characteristics not actually linked to self-esteem. […] Traits that do not readily manifest in behavior, or are low in observability, should be more difficult to detect accurately (see Funder & Dobroth, 1987). Self-esteem is one of these “low-observability” traits […] Although the operationalization of accuracy is tricky […], it does appear that people are somewhat accurate in their impressions of self-esteem […] research from various laboratories indicates that both friends […] and romantic partners […] are fairly accurate in judging each other’s self-esteem. […] However, people may also use information that has nothing to do with the appearances or behaviors of target. Instead, people may make judgements about another’s personality traits based on how they perceive their own traits […] people tend to project their own characteristics onto others […] People’s ratings of others’ self-esteem tend to be correlated with their own, be it for friends or romantic partners”
This is a neat little book in the Springer Briefs in Statistics series. The author is David J Bartholomew, a former statistics professor at the LSE. I wrote a brief goodreads review, but I thought that I might as well also add a post about the book here. The book covers topics such as the EM algorithm, Gibbs sampling, the Metropolis–Hastings algorithm and the Rasch model, and it assumes you’re familiar with stuff like how to do ML estimation, among many other things. I had some passing familiarity with many of the topics he talks about in the book, but I’m sure I’d have benefited from knowing more about some of the specific topics covered. Because large parts of the book is basically unreadable by people without a stats background I wasn’t sure how much of it it made sense to cover here, but I decided to talk a bit about a few of the things which I believe don’t require you to know a whole lot about this area.
“Modern statistics is built on the idea of models—probability models in particular. [While I was rereading this part, I was reminded of this quote which I came across while finishing my most recent quotes post: “No scientist is as model minded as is the statistician; in no other branch of science is the word model as often and consciously used as in statistics.” Hans Freudenthal.] The standard approach to any new problem is to identify the sources of variation, to describe those sources by probability distributions and then to use the model thus created to estimate, predict or test hypotheses about the undetermined parts of that model. […] A statistical model involves the identification of those elements of our problem which are subject to uncontrolled variation and a specification of that variation in terms of probability distributions. Therein lies the strength of the statistical approach and the source of many misunderstandings. Paradoxically, misunderstandings arise both from the lack of an adequate model and from over reliance on a model. […] At one level is the failure to recognise that there are many aspects of a model which cannot be tested empirically. At a higher level is the failure is to recognise that any model is, necessarily, an assumption in itself. The model is not the real world itself but a representation of that world as perceived by ourselves. This point is emphasised when, as may easily happen, two or more models make exactly the same predictions about the data. Even worse, two models may make predictions which are so close that no data we are ever likely to have can ever distinguish between them. […] All model-dependant inference is necessarily conditional on the model. This stricture needs, especially, to be borne in mind when using Bayesian methods. Such methods are totally model-dependent and thus all are vulnerable to this criticism. The problem can apparently be circumvented, of course, by embedding the model in a larger model in which any uncertainties are, themselves, expressed in probability distributions. However, in doing this we are embarking on a potentially infinite regress which quickly gets lost in a fog of uncertainty.”
“Mixtures of distributions play a fundamental role in the study of unobserved variables […] The two important questions which arise in the analysis of mixtures concern how to identify whether or not a given distribution could be a mixture and, if so, to estimate the components. […] Mixtures arise in practice because of failure to recognise that samples are drawn from several populations. If, for example, we measure the heights of men and women without distinction the overall distribution will be a mixture. It is relevant to know this because women tend to be shorter than men. […] It is often not at all obvious whether a given distribution could be a mixture […] even a two-component mixture of normals, has 5 unknown parameters. As further components are added the estimation problems become formidable. If there are many components, separation may be difficult or impossible […] [To add to the problem,] the form of the distribution is unaffected by the mixing [in the case of the mixing of normals]. Thus there is no way that we can recognise that mixing has taken place by inspecting the form of the resulting distribution alone. Any given normal distribution could have arisen naturally or be the result of normal mixing […] if f(x) is normal, there is no way of knowing whether it is the result of mixing and hence, if it is, what the mixing distribution might be.”
“Even if there is close agreement between a model and the data it does not follow that the model provides a true account of how the data arose. It may be that several models explain the data equally well. When this happens there is said to be a lack of identifiability. Failure to take full account of this fact, especially in the social sciences, has led to many over-confident claims about the nature of social reality. Lack of identifiability within a class of models may arise because different values of their parameters provide equally good fits. Or, more seriously, models with quite different characteristics may make identical predictions. […] If we start with a model we can predict, albeit uncertainly, what data it should generate. But if we are given a set of data we cannot necessarily infer that it was generated by a particular model. In some cases it may, of course, be possible to achieve identifiability by increasing the sample size but there are cases in which, no matter how large the sample size, no separation is possible. […] Identifiability matters can be considered under three headings. First there is lack of parameter identifiability which is the most common use of the term. This refers to the situation where there is more than one value of a parameter in a given model each of which gives an equally good account of the data. […] Secondly there is what we shall call lack of model identifiability which occurs when two or more models make exactly the same data predictions. […] The third type of identifiability is actually the combination of the foregoing types.
Mathematical statistics is not well-equipped to cope with situations where models are practically, but not precisely, indistinguishable because it typically deals with things which can only be expressed in unambiguously stated theorems. Of necessity, these make clear-cut distinctions which do not always correspond with practical realities. For example, there are theorems concerning such things as sufficiency and admissibility. According to such theorems, for example, a proposed statistic is either sufficient or not sufficient for some parameter. If it is sufficient it contains all the information, in a precisely defined sense, about that parameter. But in practice we may be much more interested in what we might call ‘near sufficiency’ in some more vaguely defined sense. Because we cannot give a precise mathematical definition to what we mean by this, the practical importance of the notion is easily overlooked. The same kind of fuzziness arises with what are called structural eqation models (or structural relations models) which have played a very important role in the social sciences. […] we shall argue that structural equation models are almost always unidentifiable in the broader sense of which we are speaking here. […] [our results] constitute a formidable argument against the careless use of structural relations models. […] In brief, the valid use of a structural equations model requires us to lean very heavily upon assumptions about which we may not be very sure. It is undoubtedly true that if such a model provides a good fit to the data, then it provides a possible account of how the data might have arisen. It says nothing about what other models might provide an equally good, or even better fit. As a tool of inductive inference designed to tell us something about the social world, linear structural relations modelling has very little to offer.”
“It is very common for data to be missing and this introduces a risk of bias if inferences are drawn from incomplete samples. However, we are not usually interested in the missing data themselves but in the population characteristics to whose estimation those values were intended to contribute. […] A very longstanding way of dealing with missing data is to fill in the gaps by some means or other and then carry out the standard analysis on the completed data set. This procedure is known as imputation. […] In its simplest form, each missing data point is replaced by a single value. Because there is, inevitably, uncertainty about what the imputed values should be, one can do better by substituting a range of plausible values and comparing the results in each case. This is known as multiple imputation. […] missing values may occur anywhere and in any number. They may occur haphazardly or in some pattern. In the latter case, the pattern may provide a clue to the mechanism underlying the loss of data and so suggest a method for dealing with it. The conditional distribution which we have supposed might be the basis of imputation depends, of course, on the mechanism behind the loss of data. From a practical point of view the detailed information necessary to determine this may not be readily obtainable or, even, necessary. Nevertheless, it is useful to clarify some of the issues by introducing the idea of a probability mechanism governing the loss of data. This will enable us to classify the problems which would have to be faced in a more comprehensive treatment. The simplest, if least realistic approach, is to assume that the chance of being missing is the same for all elements of the data matrix. In that case, we can, in effect, ignore the missing values […] Such situations are designated as MCAR which is an acronym for Missing Completely at Random. […] In the smoking example we have supposed that men are more likely to refuse [to answer] than women. If we go further and assume that there are no other biasing factors we are, in effect, assuming that ‘missingness’ is completely at random for men and women, separately. This would be an example of what is known as Missing at Random(MAR) […] which means that the missing mechanism depends on the observed variables but not on those that are missing. The final category is Missing Not at Random (MNAR) which is a residual category covering all other possibilities. This is difficult to deal with in practice unless one has an unusually complete knowledge of the missing mechanism.
Another term used in the theory of missing data is that of ignorability. The conditional distribution of y given x will, in general, depend on any parameters of the distribution of M [the variable we use to describe the mechanism governing the loss of observations] yet these are unlikely to be of any practical interest. It would be convenient if this distribution could be ignored for the purposes of inference about the parameters of the distribution of x. If this is the case the mechanism of loss is said to be ignorable. In practice it is acceptable to assume that the concept of ignorability is equivalent to that of MAR.”
I was very conflicted about blogging this book at all, but I figured that given I have blogged all other non-fiction books this year so far I probably ought to at least talk a little bit about this one as well. I wrote this on goodreads:
“The book contained a brief review of some mathematics used in a couple of previous courses I’ve taken, with some new details added to the mix. Having worked with this stuff before is probably a requirement to get anything much out of it, as it is highly technical.”
Here are some observations/comments from the conclusion, providing a brief outline:
“In this book we have studied discrete-time stochastic optimal control problems (OCPs) and dynamic games by means of the Euler equation (EE) approach. […] In Chap. 2 we studied the EE approach to nonstationary OCPs in discrete-time. OCPs are usually solved by dynamic programming and the Lagrange method. The latter techniques for solving OCPs are based on iteration methods or rely on guessing the form of the value or the policy functions […] In contrast, the EE approach does not require an iteration method nor knowledge about the form of the value function; on the contrary, the value function can be computed after the OCP is solved. Following the EE approach, we have to solve a second-order difference equation (possibly nonlinear and/or nonhomogeneous); there are, however, many standard methods to do this. Both the EE […] and the transversality condition (TC) […] are known in the literature. The EE […] is typically deduced from the Bellman equation whereas the necessity of the TC […] is obtained by using approximation or perturbation results. Our main results in Chap. 2 require milder assumptions […] In Theorem 2.1 we obtain the EE (2.14) and the TC (2.15), as necessary conditions for optimality, using Gâteaux differentials. […] Chapter 3 was devoted to an inverse optimal problem in stochastic control. […] Finally, in Chap. 4, some results from Chaps. 2 and 3 were applied to dynamic games. Sufficient conditions to identify MNE [Markov–Nash equilibria] and OLNE [Open-loop Nash equilibria], by following the EE approach, were given […] one of our main objectives was to identify DPGs [Dynamic potential games] by generalizing the procedure of Dechert and O’Donnell for the SLG”
“Some advantages and shortcomings of the EE approach. A first advantage of using the EE to solve discrete-time OCPs is that it is very natural and straightforward, because it is an obvious extension of results on the properties of maxima (or minima) of differentiable functions. Indeed, as shown in Sect. 2.2, using Gâteaux differentials, the EE and some transversality condition are straightforward consequences of the elementary calculus approach. From our present point of view, the main advantage of the EE approach is that it allows us to analyze certain inverse OCPs required to characterize the dynamic potential games we are interested in. It is not clear to us that these inverse OCPs can be analyzed by other methods (e.g.,, dynamic programming or the maximum principle). On the other hand, a possible disadvantage is that the Euler equation might require some “guessing” to obtain a sequence that solves it. This feature, however, is common to other solution techniques such as dynamic programming.”
If none of the above makes much sense to you, I wouldn’t worry too much about it. Stuff like this, this, and this was covered in previous coursework of mine so I was familiar with some of the stuff covered in this book; stuff like this is part of what many economists learn during their education. I figured it’d be interesting to see a more ‘pure-math’ coverage of these things. It turned out, however, that many of the applications in the book are economics-related, so in a way the coverage was ‘less pure’ than I’d thought before I started out.
A couple of links I looked up along the way are these: Gâteaux derivative, Riccati equation, Borel set. I haven’t read this, but a brief google for some of the relevant terms above made that one pop up; it looks as if it may be a good resource if you’re curious to learn more about what this kind of stuff is about.
This will be my last post about the book. Go here for a background post and my overall impression of the book – I’ll limit this post to coverage of the ‘Simple Models of Complex Phenomena’-chapter which I mentioned in that post, as well as a few observations from the introduction to part 5 of the book, which talks a little bit about what the chapter is about in general terms. The stuff they write in the chapter is in a way a sort of overview over the kind of approach to things which you may well end up adopting unconsciously if you’re working in a field like economics or ecology and a defence of such an approach; I’ve as mentioned in the previous post about the book talked about these sorts of things before, but there’s some new stuff in here as well. The chapter is written in the context of Boyd and Richerson’s coverage of their ‘Darwinian approach to evolution’, but many of the observations here are of a much more general nature and relate to the application of statistical and mathematical modelling in a much broader context; and some of those observations that do not directly relate to broader contexts still do as far as I can see have what might be termed ‘generalized analogues’. The chapter coverage was actually interesting enough for me to seriously consider reading a book or two on these topics (books such as this one), despite the amount of work I know may well be required to deal with a book like this.
I exclude a lot of stuff from the chapter in this post, and there are a lot of other good chapters in the book. Again, you should read this book.
Here’s the stuff from the introduction:
“Chapter 19 is directed at those in the social sciences unfamiliar with a style of deploying mathematical models that is second nature to economists, evolutionary biologists, engineers, and others. Much science in many disciplines consists of a toolkit of very simple mathematical models. To many not familiar with the subtle art of the simple model, such formal exercises have two seemingly deadly ﬂaws. First, they are not easy to follow. […] Second, motivation to follow the math is often wanting because the model is so cartoonishly simple relative to the real world being analyzed. Critics often level the charge ‘‘reductionism’’ with what they take to be devastating effect. The modeler’s reply is that these two criticisms actually point in opposite directions and sum to nothing. True, the model is quite simple relative to reality, but even so, the analysis is difﬁcult. The real lesson is that complex phenomena like culture require a humble approach. We have to bite off tiny bits of reality to analyze and build up a more global knowledge step by patient step. […] Simple models, simple experiments, and simple observational programs are the best the human mind can do in the face of the awesome complexity of nature. The alternatives to simple models are either complex models or verbal descriptions and analysis. Complex models are sometimes useful for their predictive power, but they have the vice of being difﬁcult or impossible to understand. The heuristic value of simple models in schooling our intuition about natural processes is exceedingly important, even when their predictive power is limited. […] Unaided verbal reasoning can be unreliable […] The lesson, we think, is that all serious students of human behavior need to know enough math to at least appreciate the contributions simple mathematical models make to the understanding of complex phenomena. The idea that social scientists need less math than biologists or other natural scientists is completely mistaken.”
And below I’ve posted the chapter coverage:
“A great deal of the progress in evolutionary biology has resulted from the deployment of relatively simple theoretical models. Staddon’s, Smith’s, and Maynard Smith’s contributions illustrate this point. Despite their success, simple models have been subjected to a steady stream of criticism. The complexity of real social and biological phenomena is compared to the toylike quality of the simple models used to analyze them and their users charged with unwarranted reductionism or plain simplemindedness.
This critique is intuitively appealing—complex phenomena would seem to require complex theories to understand them—but misleading. In this chapter we argue that the study of complex, diverse phenomena like organic evolution requires complex, multilevel theories but that such theories are best built from toolkits made up of a diverse collection of simple models. Because individual models in the toolkit are designed to provide insight into only selected aspects of the more complex whole, they are necessarily incomplete. Nevertheless, students of complex phenomena aim for a reasonably complete theory by studying many related simple models. The neo-Darwinian theory of evolution provides a good example: ﬁtness-optimizing models, one and multiple locus genetic models, and quantitative genetic models all emphasize certain details of the evolutionary process at the expense of others. While any given model is simple, the theory as a whole is much more comprehensive than any one of them.”
“In the last few years, a number of scholars have attempted to understand the processes of cultural evolution in Darwinian terms […] The idea that uniﬁes all this work is that social learning or cultural transmission can be modeled as a system of inheritance; to understand the macroscopic patterns of cultural change we must understand the microscopic processes that increase the frequency of some culturally transmitted variants and reduce the frequency of others. Put another way, to understand cultural evolution we must account for all of the processes by which cultural variation is transmitted and modiﬁed. This is the essence of the Darwinian approach to evolution.”
“In the face of the complexity of evolutionary processes, the appropriate strategy may seem obvious: to be useful, models must be realistic; they should incorporate all factors that scientists studying the phenomena know to be important. This reasoning is certainly plausible, and many scientists, particularly in economics […] and ecology […], have constructed such models, despite their complexity. On this view, simple models are primitive, things to be replaced as our sophistication about evolution grows. Nevertheless, theorists in such disciplines as evolutionary biology and economics stubbornly continue to use simple models even though improvements in empirical knowledge, analytical mathematics, and computing now enable them to create extremely elaborate models if they care to do so. Theorists of this persuasion eschew more detailed models because (1) they are hard to understand, (2) they are difﬁcult to analyze, and (3) they are often no more useful for prediction than simple models. […] Detailed models usually require very large amounts of data to determine the various parameter values in the model. Such data are rarely available. Moreover, small inaccuracies or errors in the formulation of the model can produce quite erroneous predictions. The temptation is to ‘‘tune’’ the model, making small changes, perhaps well within the error of available data, so that the model produces reasonable answers. When this is done, any predictive power that the model might have is due more to statistical ﬁtting than to the fact that it accurately represents actual causal processes. It is easy to make large sacriﬁces of understanding for small gains in predictive power.”
“In the face of these difﬁculties, the most useful strategy will usually be to build a variety of simple models that can be completely understood but that still capture the important properties of the processes of interest. Liebenstein (1976: ch. 2) calls such simple models ‘‘sample theories.’’ Students of complex and diverse subject matters develop a large body of models from which ‘‘samples’’ can be drawn for the purpose at hand. Useful sample theories result from attempts to satisfy two competing desiderata: they should be simple enough to be clearly and completely grasped, and at the same time they should reﬂect how real processes actually do work, at least to some approximation. A systematically constructed population of sample theories and combinations of them constitutes the theory of how the whole complex process works. […] If they are well designed, they are like good caricatures, capturing a few essential features of the problem in a recognizable but stylized manner and with no attempt to represent features not of immediate interest. […] The user attempts to discover ‘‘robust’’ results, conclusions that are at least qualitatively correct, at least for some range of situations, despite the complexity and diversity of the phenomena they attempt to describe. […] Note that simple models can often be tested for their scientiﬁc content via their predictions even when the situation is too complicated to make practical predictions. Experimental or statistical controls often make it possible to expose the variation due to the processes modeled, against the background of ‘‘noise’’ due to other ones, thus allowing a ceteris paribus prediction for purposes of empirical testing.”
“Generalized sample theories are an important subset of the simple sample theories used to understand complex, diverse problems. They are designed to capture the qualitative properties of the whole class of processes that they are used to represent, while more specialized ones are used for closer approximations to narrower classes of cases. […] One might agree with the case for a diverse toolkit of simple models but still doubt the utility of generalized sample theories. Fitness-maximizing calculations are often used as a simple caricature of how selection ought to work most of the time in most organisms to produce adaptations. Does such a generalized sample theory have any serious scientiﬁc purpose? Some might argue that their qualitative kind of understanding is, at best, useful for giving nonspecialists a simpliﬁed overview of complicated topics and that real scientiﬁc progress still occurs entirely in the construction of specialized sample theories that actually predict. A sterner critic might characterize the attempt to construct generalized models as loose speculation that actually inhibits the real work of discovering predictable relationships in particular systems. These kinds of objections implicitly assume that it is possible to do science without any kind of general model. All scientists have mental models of the world. The part of the model that deals with their disciplinary specialty is more detailed than the parts that represent related areas of science. Many aspects of a scientist’s mental model are likely to be vague and never expressed. The real choice is between an intuitive, perhaps covert, general theory and an explicit, often mathematical, one. […] To insist upon empirical science in the style of physics is to insist upon the impossible. However, to give up on empirical tests and prediction would be to abandon science and retreat to speculative philosophy. Generalized sample theories normally make only limited qualitative predictions. The logistic model of population growth is a good elementary example. At best, it is an accurate model only of microbial growth in the laboratory. However, it captures something of the biology of population growth in more complex cases. Moreover, its simplicity makes it a handy general model to incorporate into models that must also represent other processes such as selection, and intra- and interspeciﬁc competition. If some sample theory is consistently at variance with the data, then it must be modiﬁed. The accumulation of these kinds of modiﬁcations can eventually alter general theory […] A generalized model is useful so long as its predictions are qualitatively correct, roughly conforming to the majority of cases. It is helpful if the inevitable limits of the model are understood. It is not necessarily an embarrassment if more than one alternative formulation of a general theory, built from different sample models, is more or less equally correct. In this case, the comparison of theories that are empirically equivalent makes clearer what is at stake in scientiﬁc controversies and may suggest empirical and theoretical steps toward a resolution.”
“The thorough study of simple models includes pressing them to their extreme limits. This is especially useful at the second step of development, where simple models of basic processes are combined into a candidate generalized model of an interesting question. There are two related purposes in this exercise. First, it is helpful to have all the implications of a given simple model exposed for comparative purposes, if nothing else. A well-understood simple sample theory serves as a useful point of comparison for the results of more complex alternatives, even when some conclusions are utterly ridiculous. Second, models do not usually just fail; they fail for particular reasons that are often very informative. Just what kinds of modiﬁcations are required to make the initially ridiculous results more nearly reasonable? […] The exhaustive analysis of many sample models in various combinations is also the main means of seeking robust results (Wimsatt, 1981). One way to gain conﬁdence in simple models is to build several models embodying different characterizations of the problem of interest and different simplifying assumptions. If the results of a model are robust, the same qualitative results ought to obtain for a whole family of related models in which the supposedly extraneous details differ. […] Similarly, as more complex considerations are introduced into the family of models, simple model results can be considered robust only if it seems that the qualitative conclusion holds for some reasonable range of plausible conditions.”
“A plausibility argument is a hypothetical explanation having three features in common with a traditional hypothesis: (1) a claim of deductive soundness, of in-principle logical sufﬁciency to explain a body of data; (2) sufﬁcient support from the existing body of empirical data to suggest that it might actually be able to explain a body of data as well as or better than competing plausibility arguments; and (3) a program of research that might distinguish between the claims of competing plausibility arguments. The differences are that competing plausibility arguments (1) are seldom mutually exclusive, (2) can seldom be rejected by a single sharp experimental test (or small set of them), and (3) often end up being revised, limited in their generality or domain of applicability, or combined with competing arguments rather than being rejected. In other words, competing plausibility arguments are based on the claims that a different set of submodels is needed to achieve a given degree of realism and generality, that different parameter values of common submodels are required, or that a given model is correct as far as it goes, but applies with less generality, realism, or predictive power than its proponents claim. […] Human sociobiology provides a good example of a plausibility argument. The basic premise of human sociobiology is that ﬁtness-optimizing models drawn from evolutionary biology can be used to understand human behavior. […] We think that the clearest way to address the controversial questions raised by competing plausibility arguments is to try to formulate models with parameters such that for some values of the critical parameters the results approximate one of the polar positions in such debates, while for others the model approximates the other position.”
“A well-developed plausibility argument differs sharply from another common type of argument that we call a programmatic claim. Most generally, a programmatic claim advocates a plan of research for addressing some outstanding problem without, however, attempting to construct a full plausibility argument. […] An attack on an existing, often widely accepted, plausibility argument on the grounds that the plausibility argument is incomplete is a kind of programmatic claim. Critiques of human sociobiology are commonly of this type. […] The criticism of human sociobiology has far too frequently depended on mere programmatic claims (often invalid ones at that, as when sociobiologists are said to ignore the importance of culture and to depend on genetic variation to explain human differences). These claims are generally accompanied by dubious burden-of-proof arguments. […] We have argued that theory about complex-diverse phenomena is necessarily made up of simple models that omit many details of the phenomena under study. It is very easy to criticize theory of this kind on the grounds that it is incomplete (or defend it on the grounds that it one day will be much more complete). Such criticism and defense is not really very useful because all such models are incomplete in many ways and may be ﬂawed because of it. What is required is a plausibility argument that shows that some factor that is omitted could be sufﬁciently important to require inclusion in the theory of the phenomenon under consideration, or a plausible case that it really can be neglected for most purposes. […] It seems to us that until very recently, ‘‘nature-nurture’’ debates have been badly confused because plausibility arguments have often been taken to have been successfully countered by programmatic claims. It has proved relatively easy to construct reasonable and increasingly sophisticated Darwinian plausibility arguments about human behavior from the prevailing general theory. It is also relatively easy to spot the programmatic ﬂaws in such arguments […] The problem is that programmatic objections have not been taken to imply a promise to deliver a full plausibility claim. Rather, they have been taken as a kind of declaration of independence of the social sciences from biology. Having shown that the biological theory is in principle incomplete, the conclusion is drawn that it can safely be ignored.”
“Scientists should be encouraged to take a sophisticated attitude toward empirical testing of plausibility arguments […] Folk Popperism among scientists has had the very desirable result of reducing the amount of theory-free descriptive empiricism in many complex-diverse disciplines, but it has had the undesirable effect of encouraging a search for simple mutually exclusive hypotheses that can be accepted or rejected by single experiments. By our argument, very few important problems in evolutionary biology or the social sciences can be resolved in this way. Rather, individual empirical investigations should be viewed as weighing marginally for or against plausibility arguments. Often, empirical studies may themselves discover or suggest new plausibility arguments or reconcile old ones.”
“We suspect that most evolutionary biologists and philosophers of biology on both sides of the dispute would pretty much agree with the defense of the simple models strategy presented here. To reject the strategy of building evolutionary theory from collections of simple models is to embrace a kind of scientiﬁc nihilism in which there is no hope of achieving an understanding of how evolution works. On the other hand, there is reason to treat any given model skeptically. […] It may be possible to defend the proposition that the complexity and diversity of evolutionary phenomena make any scientiﬁc understanding of evolutionary processes impossible. Or, even if we can obtain a satisfactory understanding of particular cases of evolution, any attempt at a general, uniﬁed theory may be impossible. Some critics of adaptationism seem to invoke these arguments against adaptationism without fully embracing them. The problem is that alternatives to adaptationism must face the same problem of diversity and complexity that Darwinians use the simple model strategy to ﬁnesse. The critics, when they come to construct plausibility arguments, will also have to use relatively simple models that are vulnerable to the same attack. If there is a vulgar sociobiology, there is also a vulgar criticism of sociobiology.”