Econstudentlog

Infectious Disease Surveillance (III)

I have added some more observations from the book below.

“Zoonotic diseases are infections transmitted between animals and humans […]. A recent survey identified more than 1,400 species of human disease–causing agents, over half (58%) of which were zoonotic [2]. Moreover, nearly three-quarters (73%) of infectious diseases considered to be emerging or reemerging were zoonotic [2]. […] In many countries there is minimal surveillance for live animal imports or imported wildlife products. Minimal surveillance prevents the identification of wildlife trade–related health risks to the public, agricultural industry, and native wildlife [36] and has led to outbreaks of zoonotic diseases […] Southeast Asia [is] a hotspot for emerging zoonotic diseases because of rapid population growth, high population density, and high biodiversity […] influenza virus in particular is of zoonotic importance as multiple human infections have resulted from animal exposure [77–79].”

“[R]abies is an important cause of death in many countries, particularly in Africa and Asia [85]. Rabies is still underreported throughout the developing world, and 100-fold underreporting of human rabies is estimated for most of Africa [44]. Reasons for underreporting include lack of public health personnel, difficulties in identifying suspect animals, and limited laboratory capacity for rabies testing. […] Brucellosis […] is transmissible to humans primarily through consumption of unpasteurized milk or dairy products […] Brucella is classified as a category B bioterrorism agent [90] because of its potential for aerosolization [I should perhaps here mention that the book coverage does overlaps a bit with that of Fong & Alibek’s book – which I covered here – but that I decided against covering those topics in much detail here – US] […] The key to preventing brucellosis in humans is to control or eliminate infections in animals [91–93]; therefore, veterinarians are crucial to the identification, prevention, and control of brucellosis [89]. […] Since 1954 [there has been] an ongoing eradication program involving surveillance testing of cattle at slaughter, testing at livestock markets, and whole-herd testing on the farm [in the US] […] Except for endemic brucellosis in wildlife in the Greater Yellowstone Area, all 50 states and territories in the United States are free of bovine brucellosis [94].”

“Because of its high mortality rate in humans in the absence of early treatment, Y. pestis is viewed as one of the most pathogenic human bacteria [101]. In the United States, plague is most often found in the Southwest where it is transmitted by fleas and maintained in rodent populations [102]. Deer mice and voles typically serve as maintenance hosts [and] these animals are often resistant to plague [102]. In contrast, in amplifying host species such as prairie dogs, ground squirrels, chipmunks, and wood rats, plague spreads rapidly and results in high mortality [103]. […] Human infections with Y. pestis can result in bubonic, pneumonic, or septicemic plague, depending on the route of exposure. Bubonic plague is most common; however, pneumonic plague poses a more serious public health risk since it can be easily transmitted person-to-person through inhalation of aerosolized bacteria […] Septicemic plague is characterized by bloodstream infection with Y. pestis and can occur secondary to pneumonic or bubonic forms of infection or as a primary infection [6,60].
Plague outbreaks are often correlated with animal die-offs in the area [104], and rodent control near human residences is important to prevent disease [103]. […] household pets can be an important route of plague transmission and flea control in dogs and cats is an important prevention measure [105]. Plague surveillance involves monitoring three populations for infection: vectors (e.g., fleas), humans, and rodents [106]. In the past 20 years, the numbers of human cases of plague reported in the United States have varied from 1 to 17 cases per year [90]. […]
Since rodent species are the main reservoirs of the bacteria, these animals can be used for sentinel surveillance to provide an early warning of the public health risk to humans [106]. […] Rodent die-offs can often be an early indicator of a plague outbreak”.

“Zoonotic disease surveillance is crucial for protection of human and animal health. An integrated, sustainable system that collects data on incidence of disease in both animals and humans is necessary to ensure prompt detection of zoonotic disease outbreaks and a timely and focused response [34]. Currently, surveillance systems for animals and humans [operate] largely independently [34]. This results in an inability to rapidly detect zoonotic diseases, particularly novel emerging diseases, that are detected in the human population only after an outbreak occurs [109]. While most industrialized countries have robust disease surveillance systems, many developing countries currently lack the resources to conduct both ongoing and real-time surveillance [34,43].”

“Acute hepatitis of any cause has similar, usually indistinguishable, signs and symptoms. Acute illness is associated with fever, fatigue, nausea, abdominal pain, followed by signs of liver dysfunction, including jaundice, light to clay-colored stool, dark urine, and easy bruising. The jaundice, dark urine, and abnormal stool are because of the diminished capacity of the inflamed liver to handle the metabolism of bilirubin, which is a breakdown product of hemoglobin released as red blood cells are normally replaced. In severe hepatitis that is associated with fulminant liver disease, the liver’s capacity to produce clotting factors and to clear potential toxic metabolic products is severely impaired, with resultant bleeding and hepatic encephalopathy. […] An effective vaccine to prevent hepatitis A has been available for more than 15 years, and incidence rates of hepatitis A are dropping wherever it is used in routine childhood immunization programs. […] Currently, hepatitis A vaccine is part of the U.S. childhood immunization schedule recommended by the Advisory Committee on Immunization Practices (ACIP) [31].”

Chronic hepatitis — persistent and ongoing inflammation that can result from chronic infection — usually has minimal to no signs or symptoms […] Hepatitis B and C viruses cause acute hepatitis as well as chronic hepatitis. The acute component is often not recognized as an episode of acute hepatitis, and the chronic infection may have little or no symptoms for many years. With hepatitis B, clearance of infection is age related, as is presentation with symptoms. Over 90% of infants exposed to HBV develop chronic infection, while <1% have symptoms; 5–10% of adults develop chronic infection, but 50% or more have symptoms associated with acute infection. Among those who acquire hepatitis C, 15–45% clear the infection; the remainder have lifelong infection unless treated specifically for hepatitis C.”

“[D]ata are only received on individuals accessing care. Asymptomatic acute infection and poor or unavailable measurements for high risk populations […] have resulted in questionable estimates of the prevalence and incidence of hepatitis B and C. Further, a lack of understanding of the different types of viral hepatitis by many medical providers [18] has led to many undiagnosed individuals living with chronic infection, who are not captured in disease surveillance systems. […] Evaluation of acute HBV and HCV surveillance has demonstrated a lack of sensitivity for identifying acute infection in injection drug users; it is likely that most cases in this population go undetected, even if they receive medical care [36]. […] Best practices for conducting surveillance for chronic hepatitis B and C are not well established. […] The role of health departments in responding to infectious diseases is typically responding to acute disease. Response to chronic HBV infection is targeted to prevention of transmission to contacts of those infected, especially in high risk situations. Because of the high risk of vertical transmission and likely development of chronic disease in exposed newborns, identification and case management of HBV-infected pregnant women and their infants is a high priority. […] For a number of reasons, states do not conduct uniform surveillance for chronic hepatitis C. There is not agreement as to the utility of surveillance for chronic HCV infection, as it is a measurement of prevalent rather than incident cases.”

“Among all nationally notifiable diseases, three STDs (chlamydia, gonorrhea, and syphilis) are consistently in the top five most commonly reported diseases annually. These three STDs made up more than 86% of all reported diseases in the United States in 2010 [2]. […] The true burden of STDs is likely to be higher, as most infections are asymptomatic [4] and are never diagnosed or reported. A synthesis of a variety of data sources estimated that in 2008 there were over 100 million prevalent STDs and nearly 20 million incident STDs in the United States [5]. […] Nationally, 72% of all reported STDs are among persons aged 15–24 years [3], and it is estimated that 1 in 4 females aged 14–19 has an STD [7]. […] In 2011, the rates of chlamydia, gonorrhea, and primary and secondary syphilis among African-­Americans were, respectively, 7.5, 16.9, and 6.7 times the rates among whites [3]. Additionally, men who have sex with men (MSM) are disproportionately infected with STDs. […] several analyses have shown risk ratios above 100 for the associations between being an MSM and having syphilis or HIV [9,10]. […] Many STDs can be transmitted congenitally during pregnancy or birth. In 2008, over 400,000 neonatal deaths and stillbirths were associated with syphilis worldwide […] untreated chlamydia and gonorrhea can cause ophthalmia neonatorum in newborns, which can result in blindness [13]. The medical and societal costs for STDs are high. […] One estimate in 2008 put national costs at $15.6 billion [15].”

“A significant challenge in STD surveillance is that the term “STD” encompasses a variety of infections. Currently, there are over 35 pathogens that can be transmitted sexually, including bacteria […] protozoa […] and ectoparasites […]. Some infections can cause clinical syndromes shortly after exposure, whereas others result in no symptoms or have a long latency period. Some STDs can be easily diagnosed using self-collected swabs, while others require a sample of blood or a physical examination by a clinician. Consequently, no one particular surveillance strategy works for all STDs. […] The asymptomatic nature of most STDs limits inferences from case­-based surveillance, since in order to be counted in this system an infection must be diagnosed and reported. Additionally, many infections never result in disease. For example, an estimated 90% of human papillomavirus (HPV) infections resolve on their own without sequelae [24]. As such, simply counting infections may not be appropriate, and sequelae must also be monitored. […] Strategies for STD surveillance include case reporting; sentinel surveillance; opportunistic surveillance, including use of administrative data and positivity in screened populations; and population-­based studies […] the choice of strategy depends on the type of STD and the population of interest.”

“Determining which diseases and conditions should be included in mandatory case reporting requires balancing the benefits to the public health system (e.g., utility of the data) with the costs and burdens of case reporting. While many epidemiologists and public health practitioners follow the mantra “the more data, the better,” the costs (in both dollars and human resources) of developing and maintaining a robust case­-based reporting system can be large. Case­-based surveillance has been mandated for chlamydia, gonorrhea, syphilis, and chancroid nationally; but expansion of state­-initiated mandatory reporting for other STDs is controversial.”

August 18, 2017 Posted by | Books, Epidemiology, Immunology, Infectious disease, Medicine | Leave a comment

Depression and Heart Disease (II)

Below I have added some more observations from the book, which I gave four stars on goodreads.

“A meta-analysis of twin (and family) studies estimated the heritability of adult MDD around 40% [16] and this estimate is strikingly stable across different countries [17, 18]. If measurement error due to unreliability is taken into account by analysing MDD assessed on two occasions, heritability estimates increase to 66% [19]. Twin studies in children further show that there is already a large genetic contribution to depressive symptoms in youth, with heritability estimates varying between 50% and 80% [20–22]. […] Cardiovascular research in twin samples has suggested a clear-cut genetic contribution to hypertension (h2 = 61%) [30], fatal stroke (h2 = 32%) [31] and CAD (h2 = 57% in males and 38% in females) [32]. […] A very important, and perhaps underestimated, source of pleiotropy in the association of MDD and CAD are the major behavioural risk factors for CAD: smoking and physical inactivity. These factors are sometimes considered ‘environmental’, but twin studies have shown that such behaviours have a strong genetic component [33–35]. Heritability estimates for [many] established risk factors [for CAD – e.g. BMI, smoking, physical inactivity – US] are 50% or higher in most adult twin samples and these estimates remain remarkably similar across the adult life span [41–43].”

“The crucial question is whether the genetic factors underlying MDD also play a role in CAD and CAD risk factors. To test for an overlap in the genetic factors, a bivariate extension of the structural equation model for twin data can be used [57]. […] If the depressive symptoms in a twin predict the IL-6 level in his/her co-twin, this can only be explained by an underlying factor that affects both depression and IL-6 levels and is shared by members of a family. If the prediction is much stronger in MZ than in DZ twins, this signals that the underlying factor is their shared genetic make-up, rather than their shared (family) environment. […] It is important to note clearly here that genetic correlations do not prove the existence of pleiotropy, because genes that influence MDD may, through causal effects of MDD on CAD risk, also become ‘CAD genes’. The absence of a genetic correlation, however, can be used to falsify the existence of genetic pleiotropy. For instance, the hypothesis that genetic pleiotropy explains part of the association between depressive symptoms and IL-6 requires the genetic correlation between these traits to be significantly different from zero. [Furthermore,] the genetic correlation should have a positive value. A negative genetic correlation would signal that genes that increase the risk for depression decrease the risk for higher IL-6 levels, which would go against the genetic pleiotropy hypothesis. […] Su et al. [26] […] tested pleiotropy as a possible source of the association of depressive symptoms with Il-6 in 188 twin pairs of the Vietnam Era Twin (VET) Registry. The genetic correlation between depressive symptoms and IL-6 was found to be positive and significant (RA = 0.22, p = 0.046)”

“For the association between MDD and physical inactivity, the dominant hypothesis has not been that MDD causes a reduction in regular exercise, but instead that regular exercise may act as a protective factor against mood disorders. […] we used the twin method to perform a rigorous test of this popular hypothesis [on] 8558 twins and their family members using their longitudinal data across 2-, 4-, 7-, 9- and 11-year follow-up periods. In spite of sufficient statistical power, we found only the genetic correlation to be significant (ranging between *0.16 and *0.44 for different symptom scales and different time-lags). The environmental correlations were essentially zero. This means that the environmental factors that cause a person to take up exercise do not cause lower anxiety or depressive symptoms in that person, currently or at any future time point. In contrast, the genetic factors that cause a person to take up exercise also cause lower anxiety or depressive symptoms in that person, at the present and all future time points. This pattern of results falsifies the causal hypothesis and leaves genetic pleiotropy as the most likely source for the association between exercise and lower levels of anxiety and depressive symptoms in the population at large. […] Taken together, [the] studies support the idea that genetic pleiotropy may be a factor contributing to the increased risk for CAD in subjects suffering from MDD or reporting high counts of depressive symptoms. The absence of environmental correlations in the presence of significant genetic correlations for a number of the CAD risk factors (CFR, cholesterol, inflammation and regular exercise) suggests that pleiotropy is the sole reason for the association between MDD and these CAD risk factors, whereas for other CAD risk factors (e.g. smoking) and CAD incidence itself, pleiotropy may coexist with causal effects.”

“By far the most tested polymorphism in psychiatric genetics is a 43-base pair insertion or deletion in the promoter region of the serotonin transporter gene (5HTT, renamed SLC6A4). About 55% of Caucasians carry a long allele (L) with 16 repeat units. The short allele (S, with 14 repeat units) of this length polymorphism repeat (LPR) reduces transcriptional efficiency, resulting in decreased serotonin transporter expression and function [83]. Because serotonin plays a key role in one of the major theories of MDD [84], and because the most prescribed antidepressants act directly on this transporter, 5HTT is an obvious candidate gene for this disorder. […] The dearth of studies attempting to associate the 5HTTLPR to MDD or related personality traits tells a revealing story about the fate of most candidate genes in psychiatric genetics. Many conflicting findings have been reported, and the two largest studies failed to link the 5HTTLPR to depressive symptoms or clinical MDD [85, 86]. Even at the level of reviews and meta-analyses, conflicting conclusions have been drawn about the role of this polymorphism in the development of MDD [87, 88]. The initially promising explanation for discrepant findings – potential interactive effects of the 5HTTLPR and stressful life events [89] – did not survive meta-analysis [90].”

“Across the board, overlooking the wealth of candidate gene studies on MDD, one is inclined to conclude that this approach has failed to unambiguously identify genetic variants involved in MDD […]. Hope is now focused on the newer GWA [genome wide association] approach. […] At the time of writing, only two GWA studies had been published on MDD [81, 95]. […] In theory, the strategy to identify potential pleiotropic genes in the MDD–CAD relationship is extremely straightforward. We simply select the genes that occur in the lists of confirmed genes from the GWA studies for both traits. In practice, this is hard to do, because genetics in psychiatry is clearly lagging behind genetics in cardiology and diabetes medicine. […] What is shown by the reviewed twin studies is that some genetic variants may influence MDD and CAD risk factors. This can occur through one of three mechanisms: (a) the genetic variants that increase the risk for MDD become part of the heritability of CAD through a causal effect of MDD on CAD risk factors (causality); (b) the genetic variants that increase the risk for CAD become part of the heritability of MDD through a direct causal effect of CAD on MDD (reverse causality); (c) the genetic variants influence shared risk factors that independently increase the risk for MDD as well as CAD (pleiotropy). I suggest that to fully explain the MDD–CAD association we need to be willing to be open to the possibility that these three mechanisms co-exist. Even in the presence of true pleiotropic effects, MDD may influence CAD risk factors, and having CAD in turn may worsen the course of MDD.”

“Patients with depression are more likely to exhibit several unhealthy behaviours or avoid other health-promoting ones than those without depression. […] Patients with depression are more likely to have sleep disturbances [6]. […] sleep deprivation has been linked with obesity, diabetes and the metabolic syndrome [13]. […] Physical inactivity and depression display a complex, bidirectional relationship. Depression leads to physical inactivity and physical inactivity exacerbates depression [19]. […] smoking rates among those with depression are about twice that of the general population [29]. […] Poor attention to self-care is often a problem among those with major depressive disorder. In the most severe cases, those with depression may become inattentive to their personal hygiene. One aspect of this relationship that deserves special attention with respect to cardiovascular disease is the association of depression and periodontal disease. […] depression is associated with poor adherence to medical treatment regimens in many chronic illnesses, including heart disease. […] There is some evidence that among patients with an acute coronary syndrome, improvement in depression is associated with improvement in adherence. […] Individuals with depression are often socially withdrawn or isolated. It has been shown that patients with heart disease who are depressed have less social support [64], and that social isolation or poor social support is associated with increased mortality in heart disease patients [65–68]. […] [C]linicians who make recommendations to patients recovering from a heart attack should be aware that low levels of social support and social isolation are particularly common among depressed individuals and that high levels of social support appear to protect patients from some of the negative effects of depression [78].”

“Self-efficacy describes an individual’s self-confidence in his/her ability to accomplish a particular task or behaviour. Self-efficacy is an important construct to consider when one examines the psychological mechanisms linking depression and heart disease, since it influences an individual’s engagement in behaviour and lifestyle changes that may be critical to improving cardiovascular risk. Many studies on individuals with chronic illness show that depression is often associated with low self-efficacy [95–97]. […] Low self-efficacy is associated with poor adherence behaviour in patients with heart failure [101]. […] Much of the interest in self-efficacy comes from the fact that it is modifiable. Self-efficacy-enhancing interventions have been shown to improve cardiac patients’ self-efficacy and thereby improve cardiac health outcomes [102]. […] One problem with targeting self-efficacy in depressed heart disease patients is [however] that depressive symptoms reduce the effects of self-efficacy-enhancing interventions [105, 106].”

“Taken together, [the] SADHART and ENRICHD [studies] suggest, but do not prove, that antidepressant drug therapy in general, and SSRI treatment in particular, improve cardiovascular outcomes in depressed post-acute coronary syndrome (ACS) patients. […] even large epidemiological studies of depression and antidepressant treatment are not usually informative, because they confound the effects of depression and antidepressant treatment. […] However, there is one Finnish cohort study in which all subjects […] were followed up through a nationwide computerised database [17]. The purpose of this study was not to examine the relationship between depression and cardiac mortality, but rather to look at the relationship between antidepressant use and suicide. […] unexpectedly, ‘antidepressant use, and especially SSRI use, was associated with a marked reduction in total mortality (=49%, p < 0.001), mostly attributable to a decrease in cardiovascular deaths’. The study involved 15 390 patients with a mean follow-up of 3.4 years […] One of the marked differences between the SSRIs and the earlier tricyclic antidepressants is that the SSRIs do not cause cardiac death in overdose as the tricyclics do [41]. There has been literature that suggested that tricyclics even at therapeutic doses could be cardiotoxic and more problematic than SSRIs [42, 43]. What has been surprising is that both in the clinical trial data from ENRICHD and the epidemiological data from Finland, tricyclic treatment has also been associated with a decreased risk of mortality. […] Given that SSRI treatment of depression in the post-ACS period is safe, effective in reducing depressed mood, able to improve health behaviours and may reduce subsequent cardiac morbidity and mortality, it would seem obvious that treating depression is strongly indicated. However, the vast majority of post-ACS patients will not see a psychiatrically trained professional and many cases are not identified [33].”

“That depression is associated with cardiovascular morbidity and mortality is no longer open to question. Similarly, there is no question that the risk of morbidity and mortality increases with increasing severity of depression. Questions remain about the mechanisms that underlie this association, whether all types of depression carry the same degree of risk and to what degree treating depression reduces that risk. There is no question that the benefits of treating depression associated with coronary artery disease far outweigh the risks.”

“Two competing trends are emerging in research on psychotherapy for depression in cardiac patients. First, the few rigorous RCTs that have been conducted so far have shown that even the most efficacious of the current generation of interventions produce relatively modest outcomes. […] Second, there is a growing recognition that, even if an intervention is highly efficacious, it may be difficult to translate into clinical practice if it requires intensive or extensive contacts with a highly trained, experienced, clinically sophisticated psychotherapist. It can even be difficult to implement such interventions in the setting of carefully controlled, randomised efficacy trials. Consequently, there are efforts to develop simpler, more efficient interventions that can be delivered by a wider variety of interventionists. […] Although much more work remains to be done in this area, enough is already known about psychotherapy for comorbid depression in heart disease to suggest that a higher priority should be placed on translation of this research into clinical practice. In many cases, cardiac patients do not receive any treatment for their depression.”

August 14, 2017 Posted by | Books, Cardiology, Diabetes, Genetics, Medicine, Pharmacology, Psychiatry, Psychology | Leave a comment

Depression and Heart Disease (I)

I’m currently reading this book. It’s a great book, with lots of interesting observations.

Below I’ve added some quotes from the book.

“Frasure-Smith et al. [1] demonstrated that patients diagnosed with depression post MI [myocardial infarction, US] were more than five times more likely to die from cardiac causes by 6 months than those without major depression. At 18 months, cardiac mortality had reached 20% in patients with major depression, compared with only 3% in non-depressed patients [5]. Recent work has confirmed and extended these findings. A meta-analysis of 22 studies of post-MI subjects found that post-MI depression was associated with a 2.0–2.5 increased risk of negative cardiovascular outcomes [6]. Another meta-analysis examining 20 studies of subjects with MI, coronary artery bypass graft (CABG), angioplasty or angiographically documented CAD found a twofold increased risk of death among depressed compared with non-depressed patients [7]. Though studies included in these meta-analyses had substantial methodological variability, the overall results were quite similar [8].”

“Blumenthal et al. [31] published the largest cohort study (N = 817) to date on depression in patients undergoing CABG and measured depression scores, using the CES-D, before and at 6 months after CABG. Of those patients, 26% had minor depression (CES-D score 16–26) and 12% had moderate to severe depression (CES-D score =27). Over a mean follow-up of 5.2 years, the risk of death, compared with those without depression, was 2.4 (HR adjusted; 95% CI 1.4, 4.0) in patients with moderate to severe depression and 2.2 (95% CI 1.2, 4.2) in those whose depression persisted from baseline to follow-up at 6 months. This is one of the few studies that found a dose response (in terms of severity and duration) between depression and death in CABG in particular and in CAD in general.”

“Of the patients with known CAD but no recent MI, 12–23% have major depressive disorder by DSM-III or DSM-IV criteria [20, 21]. Two studies have examined the prognostic association of depression in patients whose CAD was confirmed by angiography. […] In [Carney et al.], a diagnosis of major depression by DSM-III criteria was the best predictor of cardiac events (MI, bypass surgery or death) at 1 year, more potent than other clinical risk factors such as impaired left ventricular function, severity of coronary disease and smoking among the 52 patients. The relative risk of a cardiac event was 2.2 times higher in patients with major depression than those with no depression.[…] Barefoot et al. [23] provided a larger sample size and longer follow-up duration in their study of 1250 patients who had undergone their first angiogram. […] Compared with non-depressed patients, those who were moderately to severely depressed had 69% higher odds of cardiac death and 78% higher odds of all-cause mortality. The mildly depressed had a 38% higher risk of cardiac death and a 57% higher risk of all-cause mortality than non-depressed patients.”

“Ford et al. [43] prospectively followed all male medical students who entered the Johns Hopkins Medical School from 1948 to 1964. At entry, the participants completed questionnaires about their personal and family history, health status and health behaviour, and underwent a standard medical examination. The cohort was then followed after graduation by mailed, annual questionnaires. The incidence of depression in this study was based on the mailed surveys […] 1190 participants [were included in the] analysis. The cumulative incidence of clinical depression in this population at 40 years of follow-up was 12%, with no evidence of a temporal change in the incidence. […] In unadjusted analysis, clinical depression was associated with an almost twofold higher risk of subsequent CAD. This association remained after adjustment for time-dependent covariates […]. The relative risk ratio for CAD development with versus without clinical depression was 2.12 (95% CI 1.24, 3.63), as was their relative risk ratio for future MI (95% CI 1.11, 4.06), after adjustment for age, baseline serum cholesterol level, parental MI, physical activity, time-dependent smoking, hypertension and diabetes. The median time from the first episode of clinical depression to first CAD event was 15 years, with a range of 1–44 years.”

“In the Women’s Ischaemia Syndrome Evaluation (WISE) study, 505 women referred for coronary angiography were followed for a mean of 4.9 years and completed the BDI [46]. Significantly increased mortality and cardiovascular events were found among women with elevated BDI scores, even after adjustment for age, cholesterol, stenosis score on angiography, smoking, diabetes, education, hyper-tension and body mass index (RR 3.1; 95% CI 1.5, 6.3). […] Further compelling evidence comes from a meta-analysis of 28 studies comprising almost 80 000 subjects [47], which demonstrated that, despite heterogeneity and differences in study quality, depression was consistently associated with increased risk of cardiovascular diseases in general, including stroke.”

“The preponderance of evidence strongly suggests that depression is a risk factor for CAD [coronary artery disease, US] development. […] In summary, it is fair to conclude that depression plays a significant role in CAD development, independent of conventional risk factors, and its adverse impact endures over time. The impact of depression on the risk of MI is probably similar to that of smoking [52]. […] Results of longitudinal cohort studies suggest that depression occurs before the onset of clinically significant CAD […] Recent brain imaging studies have indicated that lesions resulting from cerebrovascular insufficiency may lead to clinical depression [54, 55]. Depression may be a clinical manifestation of atherosclerotic lesions in certain areas of the brain that cause circulatory deficits. The depression then exacerbates the onset of CAD. The exact aetiological mechanism of depression and CAD development remains to be clarified.”

“Rutledge et al. [65] conducted a meta-analysis in 2006 in order to better understand the prevalence of depression among patients with CHF and the magnitude of the relationship between depression and clinical outcomes in the CHF population. They found that clinically significant depression was present in 21.5% of CHF patients, varying by the use of questionnaires versus diagnostic interview (33.6% and 19.3%, respectively). The combined results suggested higher rates of death and secondary events (RR 2.1; 95% CI 1.7, 2.6), and trends toward increased health care use and higher rates of hospitalisation and emergency room visits among depressed patients.”

“In the past 15 years, evidence has been provided that physically healthy subjects who suffer from depression are at increased risk for cardiovascular morbidity and mortality [1, 2], and that the occurrence of depression in patients with either unstable angina [3] or myocardial infarction (MI) [4] increases the risk for subsequent cardiac death. Moreover, epidemiological studies have proved that cardiovascular disease is a risk factor for depression, since the prevalence of depression in individuals with a recent MI or with coronary artery disease (CAD) or congestive heart failure has been found to be significantly higher than in the general population [5, 6]. […] findings suggest a bidirectional association between depression and cardiovascular disease. The pathophysiological mechanisms underlying this association are, at present, largely unclear, but several candidate mechanisms have been proposed.”

“Autonomic nervous system dysregulation is one of the most plausible candidate mechanisms underlying the relationship between depression and ischaemic heart disease, since changes of autonomic tone have been detected in both depression and cardiovascular disease [7], and autonomic imbalance […] has been found to lower the threshold for ventricular tachycardia, ventricular fibrillation and sudden cardiac death in patients with CAD [8, 9]. […] Imbalance between prothrombotic and antithrombotic mechanisms and endothelial dysfunction have [also] been suggested to contribute to the increased risk of cardiac events in both medically well patients with depression and depressed patients with CAD. Depression has been consistently associated with enhanced platelet activation […] evidence has accumulated that selective serotonin reuptake inhibitors (SSRIs) reduce platelet hyperreactivity and hyperaggregation of depressed patients [39, 40] and reduce the release of the platelet/endothelial biomarkers ß-thromboglobulin, P-selectin and E-selectin in depressed patients with acute CAD [41]. This may explain the efficacy of SSRIs in reducing the risk of mortality in depressed patients with CAD [42–44].”

“[S]everal studies have shown that reduced endothelium-dependent flow-mediated vasodilatation […] occurs in depressed adults with or without CAD [48–50]. Atherosclerosis with subsequent plaque rupture and thrombosis is the main determinant of ischaemic cardiovascular events, and atherosclerosis itself is now recognised to be fundamentally an inflammatory disease [56]. Since activation of inflammatory processes is common to both depression and cardiovascular disease, it would be reasonable to argue that the link between depression and ischaemic heart disease might be mediated by inflammation. Evidence has been provided that major depression is associated with a significant increase in circulating levels of both pro-inflammatory cytokines, such as IL-6 and TNF-a, and inflammatory acute phase proteins, especially the C-reactive protein (CRP) [57, 58], and that antidepressant treatment is able to normalise CRP levels irrespective of whether or not patients are clinically improved [59]. […] Vaccarino et al. [79] assessed specifically whether inflammation is the mechanism linking depression to ischaemic cardiac events and found that, in women with suspected coronary ischaemia, depression was associated with increased circulating levels of CRP and IL-6 and was a strong predictor of ischaemic cardiac events”

“Major depression has been consistently associated with hyperactivity of the HPA axis, with a consequent overstimulation of the sympathetic nervous system, which in turn results in increased circulating catecholamine levels and enhanced serum cortisol concentrations [68–70]. This may cause an imbalance in sympathetic and parasympathetic activity, which results in elevated heart rate and blood pressure, reduced HRV [heart rate variability], disruption of ventricular electrophysiology with increased risk of ventricular arrhythmias as well as an increased risk of atherosclerotic plaque rupture and acute coronary thrombosis. […] In addition, glucocorticoids mobilise free fatty acids, causing endothelial inflammation and excessive clotting, and are associated with hypertension, hypercholesterolaemia and glucose dysregulation [88, 89], which are risk factors for CAD.”

“Most of the literature on [the] comorbidity [between major depressive disorder (MDD) and coronary artery disease (CAD), US] has tended to favour the hypothesis of a causal effect of MDD on CAD, but reversed causality has also been suggested to contribute. Patients with severe CAD at baseline, and consequently a worse prognosis, may simply be more prone to report mood disturbances than less severely ill patients. Furthermore, in pre-morbid populations, insipid atherosclerosis in cerebral vessels may cause depressive symptoms before the onset of actual cardiac or cerebrovascular events, a variant of reverse causality known as the ‘vascular depression’ hypothesis [2]. To resolve causality, comorbidity between MDD and CAD has been addressed in longitudinal designs. Most prospective studies reported that clinical depression or depressive symptoms at baseline predicted higher incidence of heart disease at follow-up [1], which seems to favour the hypothesis of causal effects of MDD. We need to remind ourselves, however […] [that] [p]rospective associations do not necessarily equate causation. Higher incidence of CAD in depressed individuals may reflect the operation of common underlying factors on MDD and CAD that become manifest in mental health at an earlier stage than in cardiac health. […] [T]he association between MDD and CAD may be due to underlying genetic factors that lead to increased symptoms of anxiety and depression, but may also independently influence the atherosclerotic process. This phenomenon, where low-level biological variation has effects on multiple complex traits at the organ and behavioural level, is called genetic ‘pleiotropy’. If present in a time-lagged form, that is if genetic effects on MDD risk precede effects of the same genetic variants on CAD risk, this phenomenon can cause longitudinal correlations that mimic a causal effect of MDD.”

 

August 12, 2017 Posted by | Books, Cardiology, Genetics, Medicine, Neurology, Pharmacology, Psychiatry, Psychology | Leave a comment

Infectious Disease Surveillance (II)

Some more observation from the book below.

“There are three types of influenza viruses — A, B, and C — of which only types A and B cause widespread outbreaks in humans. Influenza A viruses are classified into subtypes based on antigenic differences between their two surface glycoproteins, hemagglutinin and neuraminidase. Seventeen hemagglutinin subtypes (H1–H17) and nine neuraminidase subtypes (N1–N9) have been identifed. […] The internationally accepted naming convention for influenza viruses contains the following elements: the type (e.g., A, B, C), geographical origin (e.g., Perth, Victoria), strain number (e.g., 361), year of isolation (e.g., 2011), for influenza A the hemagglutinin and neuraminidase antigen description (e.g., H1N1), and for nonhuman origin viruses the host of origin (e.g., swine) [4].”

“Only two antiviral drug classes are licensed for chemoprophylaxis and treatment of influenza—the adamantanes (amantadine and rimantadine) and the neuraminidase inhibitors (oseltamivir and zanamivir). […] Antiviral resistant strains arise through selection pressure in individual patients during treatment [which can lead to treatment failure]. […] they usually do not transmit further (because of impaired virus fitness) and have limited public health implications. On the other hand, primarily resistant viruses have emerged in the past decade and in some cases have completely replaced the susceptible strains. […] Surveillance of severe influenza illness is challenging because most cases remain undiagnosed. […] In addition, most of the influenza burden on the healthcare system is because of complications such as secondary bacterial infections and exacerbations of pre-existing chronic diseases, and often influenza is not suspected as an underlying cause. Even if suspected, the virus could have been already cleared from the respiratory secretions when the testing is performed, making diagnostic confirmation impossible. […] Only a small proportion of all deaths caused by influenza are classified as influenza-related on death certificates. […] mortality surveillance based only on death certificates is not useful for the rapid assessment of an influenza epidemic or pandemic severity. Detection of excess mortality in real time can be done by establishing specific monitoring systems that overcome these delays [such as sentinel surveillance systems, US].”

“Influenza vaccination programs are extremely complex and costly. More than half a billion doses of influenza vaccines are produced annually in two separate vaccine production cycles, one for the Northern Hemisphere and one for the Southern Hemisphere [54]. Because the influenza virus evolves constantly and vaccines are reformulated yearly, both vaccine effectiveness and safety need to be monitored routinely. Vaccination campaigns are also organized annually and require continuous public health efforts to maintain an acceptable level of vaccination coverage in the targeted population. […] huge efforts are made and resources spent to produce and distribute influenza vaccines annually. Despite these efforts, vaccination coverage among those at risk in many parts of the world remains low.”

“The Active Bacterial Core surveillance (ABCs) network and its predecessor have been examples of using surveillance as information for action for over 20 years. ABCs has been used to measure disease burden, to provide data for vaccine composition and recommended-use policies, and to monitor the impact of interventions. […] sites represent wide geographic diversity and approximately reflect the race and urban-to-rural mix of the U.S. population [37]. Currently, the population under surveillance is 19–42 million and varies by pathogen and project. […] ABCs has continuously evolved to address challenging questions posed by the six pathogens (H. influenzae; GAS [Group A Streptococcus], GBS [Group B Streptococcus], S.  pneumoniae, N. meningitidis, and MRSA) and other emerging infections. […] For the six core pathogens, the objectives are (1) to determine the incidence and epidemiologic characteristics of invasive disease in geographically diverse populations in the United States through active, laboratory, and population-based surveillance; (2) to determine molecular epidemiologic patterns and microbiologic characteristics of isolates collected as part of routine surveillance in order to track antimicrobial resistance; (3) to detect the emergence of new strains with new resistance patterns and/or virulence and contribute to development and evaluation of new vaccines; and (4) to provide an infrastructure for surveillance of other emerging pathogens and for conducting studies aimed at identifying risk factors for disease and evaluating prevention policies.”

“Food may become contaminated by over 250 bacterial, viral, and parasitic pathogens. Many of these agents cause diarrhea and vomiting, but there is no single clinical syndrome common to all foodborne diseases. Most of these agents can also be transmitted by nonfoodborne routes, including contact with animals or contaminated water. Therefore, for a given illness, it is often unclear whether the source of infection is foodborne or not. […] Surveillance systems for foodborne diseases provide extremely important information for prevention and control.”

“Since 1995, the Centers for Disease Control and Prevention (CDC) has routinely used an automated statistical outbreak detection algorithm that compares current reports of each Salmonella serotype with the preceding 5-year mean number of cases for the same geographic area and week of the year to look for unusual clusters of infection [5]. The sensitivity of Salmonella serotyping to detect outbreaks is greatest for rare serotypes, because a small increase is more noticeable against a rare background. The utility of serotyping has led to its widespread adoption in surveillance for food pathogens in many countries around the world [6]. […] Today, a new generation of subtyping methods […] is increasing the specificity of laboratory-based surveillance and its power to detect outbreaks […] Molecular subtyping allows comparison of the molecular “fingerprint” of bacterial strains. In the United States, the CDC coordinates a network called PulseNet that captures data from standardized molecular subtyping by PFGE [pulsed field gel electrophoresis]. By comparing new submissions and past data, public health officials can rapidly identify geographically dispersed clusters of disease that would otherwise not be apparent and evaluate them as possible foodborne-disease outbreaks [8]. The ability to identify geographically dispersed outbreaks has become increasingly important as more foods are mass-produced and widely distributed. […] Similar networks have been developed in Canada, Europe, the Asia Pacifc region, Latin America and the Caribbean region, the Middle Eastern region and, most recently, the African region”.

“Food consumption and practices have changed during the past 20 years in the United States, resulting in a shift from readily detectable, point-source outbreaks (e.g., attendance at a wedding dinner), to widespread outbreaks that occur over many communities with only a few illnesses in each community. One of the changes has been establishment of large food-producing facilities that disseminate products throughout the country. If a food product is contaminated with a low level of pathogen, contaminated food products are distributed across many states; and only a few illnesses may occur in each community. This type of outbreak is often difficult to detect. PulseNet has been critical for the detection of widely dispersed outbreaks in the United States [17]. […] The growth of the PulseNet database […] and the use of increasingly sophisticated epidemiological approaches have led to a dramatic increase in the number of multistate outbreaks detected and investigated.”

“Each year, approximately 35 million people are hospitalized in the United States, accounting for 170 million inpatient days [1,2]. There are no recent estimates of the numbers of healthcare-associated infections (HAI). However, two decades ago, HAI were estimated to affect more than 2 million hospital patients annually […] The mortality attributed to these HAI was estimated at about 100,000 deaths annually. […] Almost 85% of HAI in the United States are associated with bacterial pathogens, and 33% are thought to be preventable [4]. […] The primary purpose of surveillance [in the context of HAI] is to alert clinicians, epidemiologists, and laboratories of the need for targeted prevention activities required to reduce HAI rates. HAI surveillance data help to establish baseline rates that may be used to determine the potential need to change public health policy, to act and intervene in clinical settings, and to assess the effectiveness of microbiology methods, appropriateness of tests, and allocation of resources. […] As less than 10% of HAI in the United States occur as recognized epidemics [18], HAI surveillance should not be embarked on merely for the detection of outbreaks.”

“There are two types of rate comparisons — intrahospital and interhospital. The primary goals of intrahospital comparison are to identify areas within the hospital where HAI are more likely to occur and to measure the efficacy of interventional efforts. […] Without external comparisons, hospital infection control departments may [however] not know if the endemic rates in their respective facilities are relatively high or where to focus the limited fnancial and human resources of the infection control program. […] The CDC has been the central aggregating institution for active HAI surveillance in the United States since the 1960s.”

“Low sensitivity (i.e., missed infections) in a surveillance system is usually more common than low specificity (i.e., patients reported to have infections who did not actually have infections).”

“Among the numerous analyses of CDC hospital data carried out over the years, characteristics consistently found to be associated with higher HAI rates include affiliation with a medical school (i.e., teaching vs. nonteaching), size of the hospital and ICU categorized by the number of beds (large hospitals and larger ICUs generally had higher infection rates), type of control or ownership of the hospital (municipal, nonprofit, investor owned), and region of the country [43,44]. […] Various analyses of SENIC and NNIS/NHSN data have shown that differences in patient risk factors are largely responsible for interhospital differences in HAI rates. After controlling for patients’ risk factors, average lengths of stay, and measures of the completeness of diagnostic workups for infection (e.g., culturing rates), the differences in the average HAI rates of the various hospital groups virtually disappeared. […] For all of these reasons, an overall HAI rate, per se, gives little insight into whether the facility’s infection control efforts are effective.”

“Although a hospital’s surveillance system might aggregate accurate data and generate appropriate risk-adjusted HAI rates for both internal and external comparison, comparison may be misleading for several reasons. First, the rates may not adjust for patients’ unmeasured intrinsic risks for infection, which vary from hospital to hospital. […] Second, if surveillance techniques are not uniform among hospitals or are used inconsistently over time, variations will occur in sensitivity and specificity for HAI case finding. Third, the sample size […] must be sufficient. This issue is of concern for hospitals with fewer than 200 beds, which represent about 10% of hospital admissions in the United States. In most CDC analyses, rates from hospitals with very small denominators tend to be excluded [37,46,49]. […] Although many healthcare facilities around the country aggregate HAI surveillance data for baseline establishment and interhospital comparison, the comparison of HAI rates is complex, and the value of the aggregated data must be balanced against the burden of their collection. […] If a hospital does not devote sufficient resources to data collection, the data will be of limited value, because they will be replete with inaccuracies. No national database has successfully dealt with all the problems in collecting HAI data and each varies in its ability to address these problems. […] While comparative data can be useful as a tool for the prevention of HAI, in some instances no data might be better than bad data.”

August 10, 2017 Posted by | Books, Data, Epidemiology, Infectious disease, Medicine, Statistics | Leave a comment

A few diabetes papers of interest

i. Long-term Glycemic Variability and Risk of Adverse Outcomes: A Systematic Review and Meta-analysis.

“This systematic review and meta-analysis evaluates the association between HbA1c variability and micro- and macrovascular complications and mortality in type 1 and type 2 diabetes. […] Seven studies evaluated HbA1c variability among patients with type 1 diabetes and showed an association of HbA1c variability with renal disease (risk ratio 1.56 [95% CI 1.08–2.25], two studies), cardiovascular events (1.98 [1.39–2.82]), and retinopathy (2.11 [1.54–2.89]). Thirteen studies evaluated HbA1c variability among patients with type 2 diabetes. Higher HbA1c variability was associated with higher risk of renal disease (1.34 [1.15–1.57], two studies), macrovascular events (1.21 [1.06–1.38]), ulceration/gangrene (1.50 [1.06–2.12]), cardiovascular disease (1.27 [1.15–1.40]), and mortality (1.34 [1.18–1.53]). Most studies were retrospective with lack of adjustment for potential confounders, and inconsistency existed in the definition of HbA1c variability.

CONCLUSIONS HbA1c variability was positively associated with micro- and macrovascular complications and mortality independently of the HbA1c level and might play a future role in clinical risk assessment.”

Two observations related to the paper: One, although only a relatively small number of studies were included in the review, the number of patients included in some of those included studies was rather large – the 7 type 1 studies thus included 44,021 participants, and the 13 type 2 studies included in total 43,620 participants. Two, it’s noteworthy that some of the associations already look at least reasonably strong, despite interest in HbA1c variability being a relatively recent phenomenon. Confounding might be an issue, but then again it almost always might be, and to give an example, out of 11 studies analyzing the association between renal disease and HbA1c variability included in the review, ten of them support a link and the only one which does not was a small study on pediatric patients which was almost certainly underpowered to investigate such a link in the first place (the base rate of renal complications is, as mentioned before here on this blog quite recently (link 3), quite low in pediatric samples).

ii. Risk of Severe Hypoglycemia in Type 1 Diabetes Over 30 Years of Follow-up in the DCCT/EDIC Study.

(I should perhaps note here that I’m already quite familiar with the context of the DCCT/EDIC study/studies, and although readers may not be, and although background details are included in the paper, I decided not to cover such details here although they would make my coverage of the paper easier to understand. I instead decided to limit my coverage of the paper to a few observations which I myself found to be of interest.)

“During the DCCT, the rates of SH [Severe Hypoglycemia, US], including episodes with seizure or coma, were approximately threefold greater in the intensive treatment group than in the conventional treatment group […] During EDIC, the frequency of SH increased in the former conventional group and decreased in the former intensive group so that the difference in SH event rates between the two groups was no longer significant (36.6 vs. 40.8 episodes per 100 patient-years, respectively […] By the end of DCCT, with an average of 6.5 years of follow-up, 65% of the intensive group versus 35% of the conventional group experienced at least one episode of SH. In contrast, ∼50% of participants within each group reported an episode of SH during the 20 years of EDIC.”

“Of [the] participants reporting episodes of SH, during the DCCT, 54% of the intensive group and 30% of the conventional group experienced four or more episodes, whereas in EDIC, 37% of the intensive group and 33% of the conventional group experienced four or more events […]. Moreover, a subset of participants (14% [99 of 714]) experienced nearly one-half of all SH episodes (1,765 of 3,788) in DCCT, and a subset of 7% (52 of 709) in EDIC experienced almost one-third of all SH episodes (888 of 2,813) […] Fifty-one major accidents occurred during the 6.5 years of DCCT and 143 during the 20 years of EDIC […] The most frequent type of major accident was that involving a motor vehicle […] Hypoglycemia played a role as a possible, probable, or principal cause in 18 of 28 operator-caused motor vehicle accidents (MVAs) during DCCT […] and in 23 of 54 operator-caused MVAs during EDIC”.

“The T1D Exchange Clinic Registry recently reported that 8% of 4,831 adults with T1D living in the U.S. had a seizure or coma event during the 3 months before their most recent annual visit (11). During EDIC, we observed that 27% of the cohort experienced a coma or seizure event over the 20 years of 3-month reporting intervals (∼1.4% per year), a much lower annual risk than in the T1D Exchange Clinic Registry. In part, the open enrollment of patients into the T1D Exchange may be reflected without the exclusion of participants with a history of SH as in the DCCT and other clinical trials. The current data support the clinical perception that a small subset of individuals is more susceptible to SH (7% of patients with 11 or more SH episodes during EDIC, which represents 32% of all SH episodes in EDIC) […] a history of SH during DCCT and lower current HbA1c levels were the two major factors associated with an increased risk of SH during EDIC. Safety concerns were the reason why a history of frequent SH events was an exclusion criterion for enrollment in DCCT. […] Of note, we found that participants who entered the DCCT as adolescents were more likely to experience SH during EDIC.”

“In summary, although event rates in the DCCT/EDIC cohort seem to have fallen and stabilized over time, SH remains an ever-present threat for patients with T1D who use current technology, occurring at a rate of ∼36–41 episodes per 100 patient-years, even among those with longer diabetes duration. Having experienced one or more such prior events is the strongest predictor of a future SH episode.”

I didn’t actually like that summary. If a history of severe hypoglycemia was an exclusion criterion in the DCCT trial, which it was, then the event rate you’d get from this data set is highly likely to provide a biased estimator of the true event rate, as the Exchange Clinic Registry data illustrate. The true population event rate in unselected samples is higher.

Another note which may also be important to add is that many diabetics who do not have a ‘severe event’ during a specific time period might still experience a substantial number of hypoglycemic episodes; ‘severe events’ (which require the assistance of another individual) is a somewhat blunt instrument in particular for assessing quality-of-life aspects of hypoglycemia.

iii. The Presence and Consequence of Nonalbuminuric Chronic Kidney Disease in Patients With Type 1 Diabetes.

“This study investigated the prevalence of nonalbuminuric chronic kidney disease in type 1 diabetes to assess whether it increases the risk of cardiovascular and renal outcomes as well as all-cause mortality. […] This was an observational follow-up of 3,809 patients with type 1 diabetes from the Finnish Diabetic Nephropathy Study. […] mean age was 37.6 ± 11.8 years and duration of diabetes 21.2 ± 12.1 years. […] During 13 years of median follow-up, 378 developed end-stage renal disease, 415 suffered an incident cardiovascular event, and 406 died. […] At baseline, 78 (2.0%) had nonalbuminuric chronic kidney disease. […] Nonalbuminuric chronic kidney disease did not increase the risk of albuminuria (hazard ratio [HR] 2.0 [95% CI 0.9–4.4]) or end-stage renal disease (HR 6.4 [0.8–53.0]) but did increase the risk of cardiovascular events (HR 2.0 [1.4–3.5]) and all-cause mortality (HR 2.4 [1.4–3.9]). […] ESRD [End-Stage Renal Disease] developed during follow-up in 0.3% of patients with nonalbuminuric non-CKD [CKD: Chronic Kidney Disease], in 1.3% of patients with nonalbuminuric CKD, in 13.9% of patients with albuminuric non-CKD, and in 63.0% of patients with albuminuric CKD (P < 0.001).”

CONCLUSIONS Nonalbuminuric chronic kidney disease is not a frequent finding in patients with type 1 diabetes, but when present, it is associated with an increased risk of cardiovascular morbidity and all-cause mortality but not with renal outcomes.”

iv. Use of an α-Glucosidase Inhibitor and the Risk of Colorectal Cancer in Patients With Diabetes: A Nationwide, Population-Based Cohort Study.

This one relates closely to stuff covered in Horowitz & Samsom’s book about Gastrointestinal Function in Diabetes Mellitus which I just finished (and which I liked very much). Here’s a relevant quote from chapter 7 of that book (which is about ‘Hepato-biliary and Pancreatic Function’):

“Several studies have provided evidence that the risk of pancreatic cancer is increased in patients with type 1 and type 2 diabetes mellitus [136,137]. In fact, diabetes has been associated with an increased risk of several cancers, including those of the pancreas, liver, endometrium and kidney [136]. The pooled relative risk of pancreatic cancer for diabetics vs. non-diabetics in a meta-analysis was 2.1 (95% confidence interval 1.6–2.8). Patients presenting with diabetes mellitus within a period of 12 months of the diagnosis of pancreatic cancer were excluded because in these cases diabetes may be an early presenting sign of pancreatic cancer rather than a risk factor [137]”.

They don’t mention colon cancer there, but it’s obvious from the research which has been done – and which is covered extensively in that book – that diabetes has the potential to cause functional changes in a large number of components of the digestive system (and I hope to cover this kind of stuff in a lot more detail later on) so the fact that some of these changes may lead to neoplastic changes should hardly be surprising. However evaluating causal pathways is more complicated here than it might have been, because e.g. pancreatic diseases may also themselves cause secondary diabetes in some patients. Liver pathologies like hepatitis B and C also display positive associations with diabetes, although again causal pathways here are not completely clear; treatments used may be a contributing factor (interferon-treatment may induce diabetes), but there are also suggestions that diabetes should be considered one of the extrahepatic manifestations of hepatitis. This stuff is complicated.

The drug mentioned in the paper, acarbose, is incidentally a drug also discussed in some detail in the book. It belongs to a group of drugs called alpha glucosidase inhibitors, and it is ‘the first antidiabetic medication designed to act through an influence on intestinal functions.’ Anyway, some quotes from the paper:

“We conducted a nationwide, population-based study using a large cohort with diabetes in the Taiwan National Health Insurance Research Database. Patients with newly diagnosed diabetes (n = 1,343,484) were enrolled between 1998 and 2010. One control subject not using acarbose was randomly selected for each subject using acarbose after matching for age, sex, diabetes onset, and comorbidities. […] There were 1,332 incident cases of colorectal cancer in the cohort with diabetes during the follow-up period of 1,487,136 person-years. The overall incidence rate was 89.6 cases per 100,000 person-years. Patients treated with acarbose had a 27% reduction in the risk of colorectal cancer compared with control subjects. The adjusted HRs were 0.73 (95% CI 0.63–0.83), 0.69 (0.59–0.82), and 0.46 (0.37–0.58) for patients using >0 to <90, 90 to 364, and ≥365 cumulative defined daily doses of acarbose, respectively, compared with subjects who did not use acarbose (P for trend < 0.001).

CONCLUSIONS Acarbose use reduced the risk of incident colorectal cancer in patients with diabetes in a dose-dependent manner.”

It’s perhaps worth mentioning that the prevalence of type 1 is relatively low in East Asian populations and that most of the patients included were type 2 (this is also clearly indicated by this observation from the paper: “The median age at the time of the initial diabetes diagnosis was 54.1 years, and the median diabetes duration was 8.9 years.”). Another thing worth mentioning is that colon cancer is a very common type of cancer, and so even moderate risk reductions here at the individual level may translate into a substantial risk reduction at the population level. A third thing, noted in Horowitz & Samsom’s coverage, is that the side effects of acarbose are quite mild, so widespread use of the drug is not out of the question, at least poor tolerance is not likely to be an obstacle; the drug may cause e.g. excessive flatulence and something like 10% of patients may have to stop treatment because of gastrointestinal side effects, but although the side effects are annoying and may be unacceptable to some patients, they are not dangerous; it’s a safe drug which can be used even in patients with renal failure (a context where some of the other oral antidiabetic treatments available are contraindicated).

v. Diabetes, Lower-Extremity Amputation, and Death.

“Worldwide, every 30 s, a limb is lost to diabetes (1,2). Nearly 2 million people living in the U.S. are living with limb loss (1). According to the World Health Organization, lower-extremity amputations (LEAs) are 10 times more common in people with diabetes than in persons who do not have diabetes. In the U.S. Medicare population, the incidence of diabetic foot ulcers is ∼6 per 100 individuals with diabetes per year and the incidence of LEA is 4 per 1,000 persons with diabetes per year (3). LEA in those with diabetes generally carries yearly costs between $30,000 and $60,000 and lifetime costs of half a million dollars (4). In 2012, it was estimated that those with diabetes and lower-extremity wounds in the U.S. Medicare program accounted for $41 billion in cost, which is ∼1.6% of all Medicare health care spending (47). In 2012, in the U.K., it was estimated that the National Health Service spent between £639 and 662 million on foot ulcers and LEA, which was approximately £1 in every £150 spent by the National Health Service (8).”

“LEA does not represent a traditional medical complication of diabetes like myocardial infarction (MI), renal failure, or retinopathy in which organ failure is directly associated with diabetes (2). An LEA occurs because of a disease complication, usually a foot ulcer that is not healing (e.g., organ failure of the skin, failure of the biomechanics of the foot as a unit, nerve sensory loss, and/or impaired arterial vascular supply), but it also occurs at least in part as a consequence of a medical plan to amputate based on a decision between health care providers and patients (9,10). […] 30-day postoperative mortality can approach 10% […]. Previous reports have estimated that the 1-year post-LEA mortality rate in people with diabetes is between 10 and 50%, and the 5-year mortality rate post-LEA is between 30 and 80% (4,1315). More specifically, in the U.S. Medicare population mortality within a year after an incident LEA was 23.1% in 2006, 21.8% in 2007, and 20.6% in 2008 (4). In the U.K., up to 80% will die within 5 years of an LEA (8). In general, those with diabetes with an LEA are two to three times more likely to die at any given time point than those with diabetes who have not had an LEA (5). For perspective, the 5-year death rate after diagnosis of malignancy in the U.S. was 32% in 2010 (16).”

“Evidence on why individuals with diabetes and an LEA die is based on a few mainly small (e.g., <300 subjects) and often single center–based (13,1720) studies or <1 year duration of evaluation (11). In these studies, death is primarily associated with a previous history of cardiovascular disease and renal insufficiency, which are also major complications of diabetes; these complications are also associated with an increased risk of LEA. The goal of our study was to determine whether complications of diabetes well-known to be associated with death in those with diabetes such as cardiovascular disease and renal failure fully explain the higher rate of death in those who have undergone an LEA.”

“This is the largest and longest evaluation of the risk of death among those with diabetes and LEA […] Between 2003 and 2012, 416,434 individuals met the entrance criteria for the study. This cohort accrued an average of 9.0 years of follow-up and a total of 3.7 million diabetes person-years of follow-up. During this period of time, 6,566 (1.6%) patients had an LEA and 77,215 patients died (18.5%). […] The percentage of individuals who died within 30 days, 1 year, and by year 5 of their initial code for an LEA was 1.0%, 9.9%, and 27.2%, respectively. For those >65 years of age, the rates were 12.2% and 31.7%, respectively. For the full cohort of those with diabetes, the rate of death was 2.0% after 1 year of follow up and 7.3% after 5 years of follow up. In general, those with an LEA were more than three times more likely to die during a year of follow-up than an individual with diabetes who had not had an LEA. […] In any given year, >5% of those with diabetes and an LEA will die.”

“From 2003 to 2012, the HR [hazard rate, US] for death after an LEA was 3.02 (95% CI 2.90, 3.14). […] our a priori assumption was that the HR associating LEA with death would be fully diminished (i.e., it would become 1) when adjusted for the other risk factor variables. However, the fully adjusted LEA HR was diminished only ∼22% to 2.37 (95% CI 2.27, 2.48). With the exception of age >65 years, individual risk factors, in general, had minimal effect (<10%) on the HR of the association between LEA and death […] We conducted sensitivity analyses to determine the general statistical parameters of an unmeasured risk factor that could remove the association of LEA with death. We found that even if there existed a very strong risk factor with an HR of death of three, a prevalence of 10% in the general diabetes population, and a prevalence of 60% in those who had an LEA, LEA would still be associated with a statistically significant and clinically important risk of 1.30. These findings are describing a variable that would seem to be so common and so highly associated with death that it should already be clinically apparent. […] In summary, individuals with diabetes and an LEA are more likely to die at any given point in time than those who have diabetes but no LEA. While some of this variation can be explained by other known complications of diabetes, the amount that can be explained is small. Based on the results of this study, including a sensitivity analysis, it is highly unlikely that a “new” major risk factor for death exists. […] LEA is often performed because of an end-stage disease process like chronic nonhealing foot ulcer. By the time a patient has a foot ulcer and an LEA is offered, they are likely suffering from the end-stage consequence of diabetes. […] We would […] suggest that patients who have had an LEA require […] vigilant follow-up and evaluation to assure that their medical care is optimized. It is also important that GPs communicate to their patients about the risk of death to assure that patients have proper expectations about the severity of their disease.”

vi. Trends in Health Care Expenditure in U.S. Adults With Diabetes: 2002–2011.

Before quoting from the paper, I’ll remind people reading along here that ‘total medical expenditures’ != ‘total medical costs’. Lots of relevant medical costs are not included when you focus only on direct medical expenditures (sick days, early retirement, premature mortality and productivity losses associated therewith, etc., etc.). With that out of the way…

“This study examines trends in health care expenditures by expenditure category in U.S. adults with diabetes between 2002 and 2011. […] We analyzed 10 years of data representing a weighted population of 189,013,514 U.S. adults aged ≥18 years from the Medical Expenditure Panel Survey. […] Relative to individuals without diabetes ($5,058 [95% CI 4,949–5,166]), individuals with diabetes ($12,180 [11,775–12,586]) had more than double the unadjusted mean direct expenditures over the 10-year period. After adjustment for confounders, individuals with diabetes had $2,558 (2,266–2,849) significantly higher direct incremental expenditures compared with those without diabetes. For individuals with diabetes, inpatient expenditures rose initially from $4,014 in 2002/2003 to $4,183 in 2004/2005 and then decreased continuously to $3,443 in 2010/2011, while rising steadily for individuals without diabetes. The estimated unadjusted total direct expenditures for individuals with diabetes were $218.6 billion/year and adjusted total incremental expenditures were approximately $46 billion/year. […] in the U.S., direct medical costs associated with diabetes were $176 billion in 2012 (1,3). This is almost double to eight times the direct medical cost of other chronic diseases: $32 billion for COPD in 2010 (10), $93 billion for all cancers in 2008 (11), $21 billion for heart failure in 2012 (12), and $43 billion for hypertension in 2010 (13). In the U.S., total economic cost of diabetes rose by 41% from 2007 to 2012 (2). […] Our findings show that compared with individuals without diabetes, individuals with diabetes had significantly higher health expenditures from 2002 to 2011 and the bulk of the expenditures came from hospital inpatient and prescription expenditures.”

 

August 10, 2017 Posted by | Books, Cancer/oncology, Cardiology, Diabetes, Economics, Epidemiology, Gastroenterology, Medicine, Nephrology, Pharmacology | Leave a comment

Infectious Disease Surveillance (I)

Concepts and Methods in Infectious Disease Surveillance […] familiarizes the reader with basic surveillance concepts; the legal basis for surveillance in the United States and abroad; and the purposes, structures, and intended uses of surveillance at the local, state, national, and international level. […] A desire for a readily accessible, concise resource that detailed current methods and challenges in disease surveillance inspired the collaborations that resulted in this volume. […] The book covers major topics at an introductory-to-intermediate level and was designed to serve as a resource or class text for instructors. It can be used in graduate level courses in public health, human and veterinary medicine, as well as in undergraduate programs in public health–oriented disciplines. We hope that the book will be a useful primer for frontline public health practitioners, hospital epidemiologists, infection-control practitioners, laboratorians in public health settings, infectious disease researchers, and medical informatics specialists interested in a concise overview of infectious disease surveillance.”

I thought the book was sort of okay, but not really all that great. I assume part of the reason I didn’t like it as much as I might have is that someone like me don’t really need to know all the details about, say, the issues encountered in Florida while they were trying to implement electronic patient records, or whether or not the mandated reporting requirements for brucellosis in, say, Texas are different from those of, say, Florida – but the book has a lot of that kind of information. Useful knowledge if you work with this stuff, but if you don’t and you’re just curious about the topic ‘in a general way’ those kinds of details can subtract a bit from the experience. A lot of chapters cover similar topics and don’t seem all that well coordinated, in the sense that details which could easily have been left out of specific chapters without any significant information loss (because those details were covered elsewhere in the publication) are included anyway; we are probably told at least ten times what is the difference between active and passive surveillance. It probably means that the various chapters can be read more or less independently (you don’t need to read chapter 5 to understand the coverage in chapter 11), but if you’re reading the book from cover to cover the way I was that sort of approach is not ideal. However in terms of the coverage included in the individual chapters and the content in general, I feel reasonably confident that if you’re actually working in public health or related fields and so a lot of this stuff might be ‘work-relevant’ (especially if you’re from the US), it’s probably a very useful book to keep around/know about. I didn’t need to know how many ‘NBS-states’ there are, and whether or not South Carolina is such a state, but some people might.

As I’ve pointed out before, a two star goodreads rating on my part (which is the rating I gave this publication) is not an indication that I think a book is terrible, it’s an indication that the book is ‘okay’.

Below I’ve added some quotes and observations from the book. The book is an academic publication but it is not a ‘classic textbook’ with key items in bold etc.; I decided to use bold to highlight key concepts and observations below, to make the post easier to navigate later on (none of the bolded words below were in bold in the original text), but aside from that I have made no changes to the quotes included in this post. I would note that given that many of the chapters included in the book are not covered by copyright (many chapters include this observation: “Materials appearing in this chapter are prepared by individuals as part of their official duties as United States government employees and are not covered by the copyright of the book, and any views expressed herein do not necessarily represent the views of the United States government.”) I may decide to cover the book in a bit more detail than I otherwise would have.

“The methods used for infectious disease surveillance depend on the type of disease. Part of the rationale for this is that there are fundamental differences in etiology, mode of transmission, and control measures between different types of infections. […] Despite the fact that much of surveillance is practiced on a disease-specific basis, it is worth remembering that surveillance is a general tool used across all types of infectious and, noninfectious conditions, and, as such, all surveillance methods share certain core elements. We advocate the view that surveillance should not be regarded as a public health “specialty,” but rather that all public health practitioners should understand the general principles underlying surveillance.”

“Control of disease spread is achieved through public health actions. Public health actions resulting from information gained during the investigation usually go beyond what an individual physician can provide to his or her patients presenting in a clinical setting. Examples of public health actions include identifying the source of infection […] identifying persons who were in contact with the index case or any infected person who may need vaccines or antiinfectives to prevent them from developing the infection; closure of facilities implicated in disease spread; or isolation of sick individuals or, in rare circumstances, quarantining those exposed to an infected person. […] Monitoring surveillance data enables public health authorities to detect sudden changes in disease occurrence and distribution, identify changes in agents or host factors, and detect changes in healthcare practices […] The primary use of surveillance data at the local and state public health level is to identify cases or outbreaks in order to implement immediate disease control and prevention activities. […] Surveillance data are also used by states and CDC to monitor disease trends, demonstrate the need for public health interventions such as vaccines and vaccine policy, evaluate public health activities, and identify future research priorities. […] The final and most-important link in the surveillance chain is the application of […] data to disease prevention and control. A surveillance system includes a functional capacity for data collection, analysis, and dissemination linked to public health programs [6].

“The majority of reportable disease surveillance is conducted through passive surveillance methods. Passive surveillance means that public health agencies inform healthcare providers and other entities of their reporting requirements, but they do not usually conduct intensive efforts to solicit all cases; instead, the public health agency waits for the healthcare entities to submit case reports. Because passive surveillance is often incomplete, public health agencies may use hospital discharge data, laboratory testing records, mortality data, or other sources of information as checks on completeness of reporting and to identify additional cases. This is called active surveillance. Active surveillance usually includes intensive activities on the part of the public health agency to identify all cases of a specific reportable disease or group of diseases. […] Because it can be very labor intensive, active surveillance is usually conducted for a subset of reportable conditions, in a defined geographic locale and for a defined period of time.”

“Active surveillance may be conducted on a routine basis or in response to an outbreak […]. When an outbreak is suspected or identified, another type of surveillance known as enhanced passive surveillance may also be initiated. In enhanced passive surveillance methods, public health may improve communication with the healthcare community, schools, daycare centers, and other facilities and request that all suspected cases be reported to public health. […] Case-based surveillance is supplemented through laboratory-based surveillance activities. As opposed to case-based surveillance, the focus is on laboratory results themselves, independent of whether or not an individual’s result is associated with a “case” of illness meeting the surveillance case definition. Laboratory-based surveillance is conducted by state public health laboratories as well as the healthcare community (e.g., hospital, private medical office, and commercial laboratories). […] State and local public health entities participate in sentinel surveillance activities. With sentinel methods, surveillance is conducted in a sample of reporting entities, such as healthcare providers or hospitals, or in a specific population known to be an early indicator of disease activity (e.g., pediatric). However, because the goal of sentinel surveillance is not to identify every case, it is not necessarily representative of the underlying population of interest; and results should be interpreted accordingly.”

Syndromic surveillance identifies unexpected changes in prediagnostic information from a variety of sources to detect potential outbreaks [56]. Sources include work- or school-absenteeism records, pharmacy sales for over-the-counter pharmaceuticals, or emergency room admission data [51]. During the 2009 H1N1 pandemic, syndromic surveillance of emergency room visits for influenza-like illness correlated well with laboratory diagnosed cases of influenza [57]. […] According to a 2008 survey of U.S. health departments, 88% of respondents reported that they employ syndromic-based approaches as part of routine surveillance [21].

“Public health operated for many decades (and still does to some extent) using stand-alone, case-based information systems for collection of surveillance data that do not allow information sharing between systems and do not permit the ability to track the occurrences of different diseases in a specific person over time. One of the primary objectives of NEDSS [National Electronic Disease Surveillance System] is to promote person-based surveillance and integrated and interoperable surveillance systems. In an integrated person-based system, information is collected to create a public health record for a given person for different diseases over time. This enables public health to track public health conditions associated with a person over time, allowing analyses of public health events and comorbidities, as well as more robust public health interventions. An interoperable system can exchange information with other systems. For example, data are shared between surveillance systems or between other public health or clinical systems, such as an electronic health record or outbreak management system. Achieving the goal of establishing a public health record for an individual over time does not require one monolithic system that supports all needs; this can, instead, be achieved through integration and/or interoperability of systems.

“For over a decade, public health has focused on automation of reporting of laboratory results to public health from clinical laboratories and healthcare providers. Paper-based submission of laboratory results to public health for reportable conditions results in delays in receipt of information, incomplete ascertainment of possible cases, and missing information on individual reports. All of these aspects are improved through automation of the process [39–43].”

“During the pre-vaccine era, rotavirus infected nearly every unvaccinated child before their fifth birthday. In the absence of vaccine, multiple rotavirus infections may occur during infancy and childhood. Rotavirus causes severe diarrhea and vomiting (acute gastroenteritis [AGE]), which can lead to dehydration, electrolyte depletion, complications of viremia, shock, and death. Nearly one-half million children around the world die of rotavirus infections each year […] [In the US] this virus was responsible for 40–50% of hospitalizations because of acute gastroenteritis during the winter months in the era before vaccines were introduced. […] Because first infections have been shown to induce strong immunity against severe rotavirus reinfections [3] and because vaccination mimics such first infections without causing illness, vaccination was identified as the optimal strategy for decreasing the burden associated with severe and fatal rotavirus diarrhea. Any changes that may be later attributed to vaccination effects require knowledge of the pre-licensure (i.e., baseline) rates and trends in the target disease as a reference […] Efforts to obtain baseline data are necessary before a vaccine is licensed and introduced [13]. […] After the first year of widespread rotavirus vaccination coverage in 2008, very large and consistent decreases in rotavirus hospitalizations were noted around the country. Many of the decreases in childhood hospitalizations resulting from rotavirus were 90% or more, compared with the pre-licensure, baseline period.”

There is no single perfect data source for assessing any VPD [Vaccine-Preventable Disease, US]. Meaningful surveillance is achieved by the much broader approach of employing diverse datasets. The true impact of a vaccine or the accurate assessment of disease trends in a population is more likely the result of evaluating many datasets having different strengths and weaknesses. Only by understanding these strengths and weaknesses can a public health practitioner give the appropriate consideration to the findings derived from these data. […] In a Phase III clinical trial, the vaccine is typically administered to large numbers of people who have met certain inclusionary and exclusionary criteria and are then randomly selected to receive either the vaccine or a placebo. […] Phase III trials represent the “best case scenario” of vaccine protection […] Once the Phase III trials show adequate protection and safety, the vaccine may be licensed by the FDA […] When the vaccine is used in routine clinical practice, Phase IV trials (called post-licensure studies or post-marketing studies) are initiated. These are the evaluations conducted during the course of VPD surveillance that delineate additional performance information in settings where strict controls on who receives the vaccine are not present. […] Often, measuring vaccine performance in the broader population yields slightly lower protective results compared to Phase III clinical trials […] During these post-licensure Phase IV studies, it is not the vaccine’s efficacy but its effectiveness that is assessed. […] Administrative datasets may be created by research institutions, managed-care organizations, or national healthcare utilization repositories. They are not specifically created for VPD surveillance and may contain coded data […] on health events. They often do not provide laboratory confirmation of specific diseases, unlike passive and active VPD surveillance. […] administrative datasets offer huge sample sizes, which allow for powerful inferences within the confines of any data limitations.”

August 6, 2017 Posted by | Books, Epidemiology, Infectious disease, Medicine, Pharmacology | Leave a comment

Beyond Significance Testing (IV)

Below I have added some quotes from chapters 5, 6, and 7 of the book.

“There are two broad classes of standardized effect sizes for analysis at the group or variable level, the d family, also known as group difference indexes, and the r family, or relationship indexes […] Both families are metric- (unit-) free effect sizes that can compare results across studies or variables measured in different original metrics. Effect sizes in the d family are standardized mean differences that describe mean contrasts in standard deviation units, which can exceed 1.0 in absolute value. Standardized mean differences are signed effect sizes, where the sign of the statistic indicates the direction of the corresponding contrast. Effect sizes in the r family are scaled in correlation units that generally range from 1.0 to +1.0, where the sign indicates the direction of the relation […] Measures of association are unsigned effect sizes and thus do not indicate directionality.”

“The correlation rpb is for designs with two unrelated samples. […] rpb […] is affected by base rate, or the proportion of cases in one group versus the other, p and q. It tends to be highest in balanced designs. As the design becomes more unbalanced holding all else constant, rpb approaches zero. […] rpb is not directly comparable across studies with dissimilar relative group sizes […]. The correlation rpb is also affected by the total variability (i.e., ST). If this variation is not constant over samples, values of rpb may not be directly comparable.”

“Too many researchers neglect to report reliability coefficients for scores analyzed. This is regrettable because effect sizes cannot be properly interpreted without knowing whether the scores are precise. The general effect of measurement error in comparative studies is to attenuate absolute standardized effect sizes and reduce the power of statistical tests. Measurement error also contributes to variation in observed results over studies. Of special concern is when both score reliabilities and sample sizes vary from study to study. If so, effects of sampling error are confounded with those due to measurement error. […] There are ways to correct some effect sizes for measurement error (e.g., Baguley, 2009), but corrected effect sizes are rarely reported. It is more surprising that measurement error is ignored in most meta-analyses, too. F. L. Schmidt (2010) found that corrected effect sizes were analyzed in only about 10% of the 199 meta-analytic articles published in Psychological Bulletin from 1978 to 2006. This implies that (a) estimates of mean effect sizes may be too low and (b) the wrong statistical model may be selected when attempting to explain between-studies variation in results. If a fixed
effects model is mistakenly chosen over a random effects model, confidence intervals based on average effect sizes tend to be too narrow, which can make those results look more precise than they really are. Underestimating mean effect sizes while simultaneously overstating their precision is a potentially serious error.”

“[D]emonstration of an effect’s significance — whether theoretical, practical, or clinical — calls for more discipline-specific expertise than the estimation of its magnitude”.

“Some outcomes are categorical instead of continuous. The levels of a categorical outcome are mutually exclusive, and each case is classified into just one level. […] The risk difference (RD) is defined as pCpT, and it estimates the parameter πC πT. [Those ‘n-resembling letters’ is how wordpress displays pi; this is one of an almost infinite number of reasons why I detest blogging equations on this blog and usually do not do this – US] […] The risk ratio (RR) is the ratio of the risk rates […] which rate appears in the numerator versus the denominator is arbitrary, so one should always explain how RR is computed. […] The odds ratio (OR) is the ratio of the within-groups odds for the undesirable event. […] A convenient property of OR is that it can be converted to a kind of standardized mean difference known as logit d (Chinn, 2000). […] Reporting logit d may be of interest when the hypothetical variable that underlies the observed dichotomy is continuous.”

“The risk difference RD is easy to interpret but has a drawback: Its range depends on the values of the population proportions πC and πT. That is, the range of RD is greater when both πC and πT are closer to .50 than when they are closer to either 0 or 1.00. The implication is that RD values may not be comparable across different studies when the corresponding parameters πC and πT are quite different. The risk ratio RR is also easy to interpret. It has the shortcoming that only the finite interval from 0 to < 1.0 indicates lower risk in the group represented in the numerator, but the interval from > 1.00 to infinity is theoretically available for describing higher risk in the same group. The range of RR varies according to its denominator. This property limits the value of RR for comparing results across different studies. […] The odds ratio or shares the limitation that the finite interval from 0 to < 1.0 indicates lower risk in the group represented in the numerator, but the interval from > 1.0 to infinity describes higher risk for the same group. Analyzing natural log transformations of OR and then taking antilogs of the results deals with this problem, just as for RR. The odds ratio may be the least intuitive of the comparative risk effect sizes, but it probably has the best overall statistical properties. This is because OR can be estimated in prospective studies, in studies that randomly sample from exposed and unexposed populations, and in retrospective studies where groups are first formed based on the presence or absence of a disease before their exposure to a putative risk factor is determined […]. Other effect sizes may not be valid in retrospective studies (RR) or in studies without random sampling ([Pearson correlations between dichotomous variables, US]).”

“Sensitivity and specificity are determined by the threshold on a screening test. This means that different thresholds on the same test will generate different sets of sensitivity and specificity values in the same sample. But both sensitivity and specificity are independent of population base rate and sample size. […] Sensitivity and specificity affect predictive value, the proportion of test results that are correct […] In general, predictive values increase as sensitivity and specificity increase. […] Predictive value is also influenced by the base rate (BR), the proportion of all cases with the disorder […] In general, PPV [positive predictive value] decreases and NPV [negative…] increases as BR approaches zero. This means that screening tests tend to be more useful for ruling out rare disorders than correctly predicting their presence. It also means that most positive results may be false positives under low base rate conditions. This is why it is difficult for researchers or social policy makers to screen large populations for rare conditions without many false positives. […] The effect of BR on predictive values is striking but often overlooked, even by professionals […]. One misunderstanding involves confusing sensitivity and specificity, which are invariant to BR, with PPV and NPV, which are not. This means that diagnosticians fail to adjust their estimates of test accuracy for changes in base rates, which exemplifies the base rate fallacy. […] In general, test results have greater impact on changing the pretest odds when the base rate is moderate, neither extremely low (close to 0) nor extremely high (close to 1.0). But if the target disorder is either very rare or very common, only a result from a highly accurate screening test will change things much.”

“The technique of ANCOVA [ANalysis of COVAriance, US] has two more assumptions than ANOVA does. One is homogeneity of regression, which requires equal within-populations unstandardized regression coefficients for predicting outcome from the covariate. In nonexperimental designs where groups differ systematically on the covariate […] the homogeneity of regression assumption is rather likely to be violated. The second assumption is that the covariate is measured without error […] Violation of either assumption may lead to inaccurate results. For example, an unreliable covariate in experimental designs causes loss of statistical power and in nonexperimental designs may also cause inaccurate adjustment of the means […]. In nonexperimental designs where groups differ systematically, these two extra assumptions are especially likely to be violated. An alternative to ANCOVA is propensity score analysis (PSA). It involves the use of logistic regression to estimate the probability for each case of belonging to different groups, such as treatment versus control, in designs without randomization, given the covariate(s). These probabilities are the propensities, and they can be used to match cases from nonequivalent groups.”

August 5, 2017 Posted by | Books, Epidemiology, Papers, Statistics | Leave a comment

How Species Interact

There are multiple reasons why I have not covered Arditi and Ginzburg’s book before, but none of them are related to the quality of the book’s coverage. It’s a really nice book. However the coverage is somewhat technical and model-focused, which makes it harder to blog than other kinds of books. Also, the version of the book I read was a hardcover ‘paper book’ version, and ‘paper books’ take a lot more work for me to cover than do e-books.

I should probably get it out of the way here at the start of the post that if you’re interested in ecology, predator-prey dynamics, etc., this book is a book you would be well advised to read; or, if you don’t read the book, you should at least familiarize yourself with the ideas therein e.g. through having a look at some of Arditi & Ginzburg’s articles on these topics. I should however note that I don’t actually think skipping the book and having a look at some articles instead will necessarily be a labour-saving strategy; the book is not particularly long and it’s to the point, so although it’s not a particularly easy read their case for ratio dependence is actually somewhat easy to follow – if you take the effort – in the sense that I believe how different related ideas and observations are linked is quite likely better expounded upon in the book than they might have been in their articles. The presumably wrote the book precisely in order to provide a concise yet coherent overview.

I have had some trouble figuring out how to cover this book, and I’m still not quite sure what might be/have been the best approach; when covering technical books I’ll often skip a lot of detail and math and try to stick to what might be termed ‘the main ideas’ when quoting from such books, but there’s a clear limit as to how many of the technical details included in a book like this it is possible to skip if you still want to actually talk about the stuff covered in the work, and this sometimes make blogging such books awkward. These authors spend a lot of effort talking about how different ecological models work and which sort of conclusions these different models may lead to in different contexts, and this kind of stuff is a very big part of the book. I’m not sure if you strictly need to have read an ecology textbook or two before you read this one in order to be able to follow the coverage, but I know that I personally derived some benefit from having read Gurney & Nisbet’s ecology text in the past and I did look up stuff in that book a few times along the way, e.g. when reminding myself what a Holling type 2 functional response is and how models with such a functional response pattern behave. ‘In theory’ I assume one might argue that you could theoretically look up all the relevant concepts along the way without any background knowledge of ecology – assuming you have a decent understanding of basic calculus/differential equations, linear algebra, equilibrium dynamics, etc. (…systems analysis? It’s hard for me to know and outline exactly which sources I’ve read in the past which helped make this book easier to read than it otherwise would have been, but suffice it to say that if you look at the page count and think that this will be an quick/easy read, it will be that only if you’ve read more than a few books on ‘related topics’, broadly defined, in the past), but I wouldn’t advise reading the book if all you know is high school math – the book will be incomprehensible to you, and you won’t make it. I ended up concluding that it would simply be too much work to try to make this post ‘easy’ to read for people who are unfamiliar with these topics and have not read the book, so although I’ve hardly gone out of my way to make the coverage hard to follow, the blog coverage that is to follow is mainly for my own benefit.

First a few relevant links, then some quotes and comments.

Lotka–Volterra equations.
Ecosystem model.
Arditi–Ginzburg equations. (Yep, these equations are named after the authors of this book).
Nicholson–Bailey model.
Functional response.
Monod equation.
Rosenzweig-MacArthur predator-prey model.
Trophic cascade.
Underestimation of mutual interference of predators.
Coupling in predator-prey dynamics: Ratio Dependence.
Michaelis–Menten kinetics.
Trophic level.
Advection–diffusion equation.
Paradox of enrichment. [Two quotes from the book: “actual systems do not behave as Rosensweig’s model predict” + “When ecologists have looked for evidence of the paradox of enrichment in natural and laboratory systems, they often find none and typically present arguments about why it was not observed”]
Predator interference emerging from trophotaxis in predator–prey systems: An individual-based approach.
Directed movement of predators and the emergence of density dependence in predator-prey models.

“Ratio-dependent predation is now covered in major textbooks as an alternative to the standard prey-dependent view […]. One of this book’s messages is that the two simple extreme theories, prey dependence and ratio dependence, are not the only alternatives: they are the ends of a spectrum. There are ecological domains in which one view works better than the other, with an intermediate view also being a possible case. […] Our years of work spent on the subject have led us to the conclusion that, although prey dependence might conceivably be obtained in laboratory settings, the common case occurring in nature lies close to the ratio-dependent end. We believe that the latter, instead of the prey-dependent end, can be viewed as the “null model of predation.” […] we propose the gradual interference model, a specific form of predator-dependent functional response that is approximately prey dependent (as in the standard theory) at low consumer abundances and approximately ratio dependent at high abundances. […] When density is low, consumers do not interfere and prey dependence works (as in the standard theory). When consumers density is sufficiently high, interference causes ratio dependence to emerge. In the intermediate densities, predator-dependent models describe partial interference.”

“Studies of food chains are on the edge of two domains of ecology: population and community ecology. The properties of food chains are determined by the nature of their basic link, the interaction of two species, a consumer and its resource, a predator and its prey.1 The study of this basic link of the chain is part of population ecology while the more complex food webs belong to community ecology. This is one of the main reasons why understanding the dynamics of predation is important for many ecologists working at different scales.”

“We have named predator-dependent the functional responses of the form g = g(N,P), where the predator density P acts (in addition to N [prey abundance, US]) as an independent variable to determine the per capita kill rate […] predator-dependent functional response models have one more parameter than the prey-dependent or the ratio-dependent models. […] The main interest that we see in these intermediate models is that the additional parameter can provide a way to quantify the position of a specific predator-prey pair of species along a spectrum with prey dependence at one end and ratio dependence at the other end:

g(N) <- g(N,P) -> g(N/P) (1.21)

In the Hassell-Varley and Arditi-Akçakaya models […] the mutual interference parameter m plays the role of a cursor along this spectrum, from m = 0 for prey dependence to m = 1 for ratio dependence. Note that this theory does not exclude that strong interference goes “beyond ratio dependence,” with m > 1.2 This is also called overcompensation. […] In this book, rather than being interested in the interference parameters per se, we use predator-dependent models to determine, either parametrically or nonparametrically, which of the ends of the spectrum (1.21) better describes predator-prey systems in general.”

“[T]he fundamental problem of the Lotka-Volterra and the Rosensweig-MacArthur dynamic models lies in the functional response and in the fact that this mathematical function is assumed not to depend on consumer density. Since this function measures the number of prey captured per consumer per unit time, it is a quantity that should be accessible to observation. This variable could be apprehended either on the fast behavioral time scale or on the slow demographic time scale. These two approaches need not necessarily reveal the same properties: […] a given species could display a prey-dependent response on the fast scale and a predator-dependent response on the slow scale. The reason is that, on a very short scale, each predator individually may “feel” virtually alone in the environment and react only to the prey that it encounters. On the long scale, the predators are more likely to be affected by the presence of conspecifics, even without direct encounters. In the demographic context of this book, it is the long time scale that is relevant. […] if predator dependence is detected on the fast scale, then it can be inferred that it must be present on the slow scale; if predator dependence is not detected on the fast scale, it cannot be inferred that it is absent on the slow scale.”

Some related thoughts. A different way to think about this – which they don’t mention in the book, but which sprang to mind to me as I was reading it – is to think about this stuff in terms of a formal predator territorial overlap model and then asking yourself this question: Assume there’s zero territorial overlap – does this fact mean that the existence of conspecifics does not matter? The answer is of course no. The sizes of the individual patches/territories may be greatly influenced by the predator density even in such a context. Also, the territorial area available to potential offspring (certainly a fitness-relevant parameter) may be greatly influenced by the number of competitors inhabiting the surrounding territories. In relation to the last part of the quote it’s easy to see that in a model with significant territorial overlap you don’t need direct behavioural interaction among predators for the overlap to be relevant; even if two bears never meet, if one of them eats a fawn the other one would have come across two days later, well, such indirect influences may be important for prey availability. Of course as prey tend to be mobile, even if predator territories are static and non-overlapping in a geographic sense, they might not be in a functional sense. Moving on…

“In [chapter 2 we] attempted to assess the presence and the intensity of interference in all functional response data sets that we could gather in the literature. Each set must be trivariate, with estimates of the prey consumed at different values of prey density and different values of predator densities. Such data sets are not very abundant because most functional response experiments present in the literature are simply bivariate, with variations of the prey density only, often with a single predator individual, ignoring the fact that predator density can have an influence. This results from the usual presentation of functional responses in textbooks, which […] focus only on the influence of prey density.
Among the data sets that we analyzed, we did not find a single one in which the predator density did not have a significant effect. This is a powerful empirical argument against prey dependence. Most systems lie somewhere on the continuum between prey dependence (m=0) and ratio dependence (m=1). However, they do not appear to be equally distributed. The empirical evidence provided in this chapter suggests that they tend to accumulate closer to the ratio-dependent end than to the prey-dependent end.”

“Equilibrium properties result from the balanced predator-prey equations and contain elements of the underlying dynamic model. For this reason, the response of equilibria to a change in model parameters can inform us about the structure of the underlying equations. To check the appropriateness of the ratio-dependent versus prey-dependent views, we consider the theoretical equilibrium consequences of the two contrasting assumptions and compare them with the evidence from nature. […] According to the standard prey-dependent theory, in reference to [an] increase in primary production, the responses of the populations strongly depend on their level and on the total number of trophic levels. The last, top level always responds proportionally to F [primary input]. The next to the last level always remains constant: it is insensitive to enrichment at the bottom because it is perfectly controled [sic] by the last level. The first, primary producer level increases if the chain length has an odd number of levels, but declines (or stays constant with a Lotka-Volterra model) in the case of an even number of levels. According to the ratio-dependent theory, all levels increase proportionally, independently of how many levels are present. The present purpose of this chapter is to show that the second alternative is confirmed by natural data and that the strange predictions of the prey-dependent theory are unsupported.”

“If top predators are eliminated or reduced in abundance, models predict that the sequential lower trophic levels must respond by changes of alternating signs. For example, in a three-level system of plants-herbivores-predators, the reduction of predators leads to the increase of herbivores and the consequential reduction in plant abundance. This response is commonly called the trophic cascade. In a four-level system, the bottom level will increase in response to harvesting at the top. These predicted responses are quite intuitive and are, in fact, true for both short-term and long-term responses, irrespective of the theory one employs. […] A number of excellent reviews have summarized and meta-analyzed large amounts of data on trophic cascades in food chains […] In general, the cascading reaction is strongest in lakes, followed by marine systems, and weakest in terrestrial systems. […] Any theory that claims to describe the trophic chain equilibria has to produce such cascading when top predators are reduced or eliminated. It is well known that the standard prey-dependent theory supports this view of top-down cascading. It is not widely appreciated that top-down cascading is likewise a property of ratio-dependent trophic chains. […] It is [only] for equilibrial responses to enrichment at the bottom that predictions are strikingly different according to the two theories”.

As the book does spend a little time on this I should perhaps briefly interject here that the above paragraph should not be taken to indicate that the two types of models provide identical predictions in the top-down cascading context in all cases; both predict cascading, but there are even so some subtle differences between the models here as well. Some of these differences are however quite hard to test.

“[T]he traditional Lotka-Volterra interaction term […] is nothing other than the law of mass action of chemistry. It assumes that predator and prey individuals encounter each other randomly in the same way that molecules interact in a chemical solution. Other prey-dependent models, like Holling’s, derive from the same idea. […] an ecological system can only be described by such a model if conspecifics do not interfere with each other and if the system is sufficiently homogeneous […] we will demonstrate that spatial heterogeneity, be it in the form of a prey refuge or in the form of predator clusters, leads to emergence of gradual interference or of ratio dependence when the functional response is observed at the population level. […] We present two mechanistic individual-based models that illustrate how, with gradually increasing predator density and gradually increasing predator clustering, interference can become gradually stronger. Thus, a given biological system, prey dependent at low predator density, can gradually become ratio dependent at high predator density. […] ratio dependence is a simple way of summarizing the effects induced by spatial heterogeneity, while the prey dependent [models] (e.g., Lotka-Volterra) is more appropriate in homogeneous environments.”

“[W]e consider that a good model of interacting species must be fundamentally invariant to a proportional change of all abundances in the system. […] Allowing interacting populations to expand in balanced exponential growth makes the laws of ecology invariant with respect to multiplying interacting abundances by the same constant, so that only ratios matter. […] scaling invariance is required if we wish to preserve the possibility of joint exponential growth of an interacting pair. […] a ratio-dependent model allows for joint exponential growth. […] Neither the standard prey-dependent models nor the more general predator-dependent models allow for balanced growth. […] In our view, communities must be expected to expand exponentially in the presence of unlimited resources. Of course, limiting factors ultimately stop this expansion just as they do for a single species. With our view, it is the limiting resources that stop the joint expansion of the interacting populations; it is not directly due to the interactions themselves. This partitioning of the causes is a major simplification that traditional theory implies only in the case of a single species.”

August 1, 2017 Posted by | Biology, Books, Chemistry, Ecology, Mathematics, Studies | Leave a comment

Words

Most of these words are words I encountered while reading Rex Stout novels. To be more specific, perhaps 70 out of these 80 words are words I encountered while reading the Stout novels: And Be a Villain, Trouble in Triplicate, The Second Confession, Three Doors to Death, In the Best Families, Curtains for Three, Murder by the Book, Triple Jeopardy, Prisoner’s Base, and The Golden Spiders.

A few of the words are words which I have also included in previous posts of this kind, but the great majority of the words included are words which I have not previously blogged.

Percipient. Mantlet. Crick. Sepal. Shad. Lam. Gruff. Desist. Arachnology. Raffia. Electroplate. Runt. Temerarious. Temerity. Grump. Chousing. Gyp. Percale. Piddling. Dubiety.

Consommé. Pentathlon. Glower. Divvy. Styptic. Pattycake. Sagacity. Folderol. Glisten. Tassel. Bruit. Petiole. Zwieback. Hock. Flub. Shamus. Concessionaire. Pleat. Echelon. Colleen.

Apodictical. Glisten. Tortfeasor. Arytenoid. Cricoid. Splenetic. Zany. Tint. Boorish. Shuttlecock. Rangy. Gangly. Kilter. Caracul. Adventitious. Malefic. Rancor. Seersucker. Stooge. Frontispiece.

Flange. Avocation. Kobold. Platen. Forlorn. Sourpuss. Celadon. Griddle. Malum. Moot. Albacore. Gaff. Exigency. Cartado. Witling. Flounce. Glom. Pennant. Vernier. Blat.

July 28, 2017 Posted by | Books, language | 2 Comments

Beyond Significance Testing (III)

There are many ways to misinterpret significance tests, and this book spends quite a bit of time and effort on these kinds of issues. I decided to include in this post quite a few quotes from chapter 4 of the book, which deals with these topics in some detail. I also included some notes on effect sizes.

“[P] < .05 means that the likelihood of the data or results even more extreme given random sampling under the null hypothesis is < .05, assuming that all distributional requirements of the test statistic are satisfied and there are no other sources of error variance. […] the odds-against-chance fallacy […] [is] the false belief that p indicates the probability that a result happened by sampling error; thus, p < .05 says that there is less than a 5% likelihood that a particular finding is due to chance. There is a related misconception i call the filter myth, which says that p values sort results into two categories, those that are a result of “chance” (H0 not rejected) and others that are due to “real” effects (H0 rejected). These beliefs are wrong […] When p is calculated, it is already assumed that H0 is true, so the probability that sampling error is the only explanation is already taken to be 1.00. It is thus illogical to view p as measuring the likelihood of sampling error. […] There is no such thing as a statistical technique that determines the probability that various causal factors, including sampling error, acted on a particular result.

Most psychology students and professors may endorse the local Type I error fallacy [which is] the mistaken belief that p < .05 given α = .05 means that the likelihood that the decision just taken to reject H0 is a type I error is less than 5%. […] p values from statistical tests are conditional probabilities of data, so they do not apply to any specific decision to reject H0. This is because any particular decision to do so is either right or wrong, so no probability is associated with it (other than 0 or 1.0). Only with sufficient replication could one determine whether a decision to reject H0 in a particular study was correct. […] the valid research hypothesis fallacy […] refers to the false belief that the probability that H1 is true is > .95, given p < .05. The complement of p is a probability, but 1 – p is just the probability of getting a result even less extreme under H0 than the one actually found. This fallacy is endorsed by most psychology students and professors”.

“[S]everal different false conclusions may be reached after deciding to reject or fail to reject H0. […] the magnitude fallacy is the false belief that low p values indicate large effects. […] p values are confounded measures of effect size and sample size […]. Thus, effects of trivial magnitude need only a large enough sample to be statistically significant. […] the zero fallacy […] is the mistaken belief that the failure to reject a nil hypothesis means that the population effect size is zero. Maybe it is, but you cannot tell based on a result in one sample, especially if power is low. […] The equivalence fallacy occurs when the failure to reject H0: µ1 = µ2 is interpreted as saying that the populations are equivalent. This is wrong because even if µ1 = µ2, distributions can differ in other ways, such as variability or distribution shape.”

“[T]he reification fallacy is the faulty belief that failure to replicate a result is the failure to make the same decision about H0 across studies […]. In this view, a result is not considered replicated if H0 is rejected in the first study but not in the second study. This sophism ignores sample size, effect size, and power across different studies. […] The sanctification fallacy refers to dichotomous thinking about continuous p values. […] Differences between results that are “significant” versus “not significant” by close margins, such as p = .03 versus p = .07 when α = .05, are themselves often not statistically significant. That is, relatively large changes in p can correspond to small, nonsignificant changes in the underlying variable (Gelman & Stern, 2006). […] Classical parametric statistical tests are not robust against outliers or violations of distributional assumptions, especially in small, unrepresentative samples. But many researchers believe just the opposite, which is the robustness fallacy. […] most researchers do not provide evidence about whether distributional or other assumptions are met”.

“Many [of the above] fallacies involve wishful thinking about things that researchers really want to know. These include the probability that H0 or H1 is true, the likelihood of replication, and the chance that a particular decision to reject H0 is wrong. Alas, statistical tests tell us only the conditional probability of the data. […] But there is [however] a method that can tell us what we want to know. It is not a statistical technique; rather, it is good, old-fashioned replication, which is also the best way to deal with the problem of sampling error. […] Statistical significance provides even in the best case nothing more than low-level support for the existence of an effect, relation, or difference. That best case occurs when researchers estimate a priori power, specify the correct construct definitions and operationalizations, work with random or at least representative samples, analyze highly reliable scores in distributions that respect test assumptions, control other major sources of imprecision besides sampling error, and test plausible null hypotheses. In this idyllic scenario, p values from statistical tests may be reasonably accurate and potentially meaningful, if they are not misinterpreted. […] The capability of significance tests to address the dichotomous question of whether effects, relations, or differences are greater than expected levels of sampling error may be useful in some new research areas. Due to the many limitations of statistical tests, this period of usefulness should be brief. Given evidence that an effect exists, the next steps should involve estimation of its magnitude and evaluation of its substantive significance, both of which are beyond what significance testing can tell us. […] It should be a hallmark of a maturing research area that significance testing is not the primary inference method.”

“[An] effect size [is] a quantitative reflection of the magnitude of some phenomenon used for the sake of addressing a specific research question. In this sense, an effect size is a statistic (in samples) or parameter (in populations) with a purpose, that of quantifying a phenomenon of interest. more specific definitions may depend on study design. […] cause size refers to the independent variable and specifically to the amount of change in it that produces a given effect on the dependent variable. A related idea is that of causal efficacy, or the ratio of effect size to the size of its cause. The greater the causal efficacy, the more that a given change on an independent variable results in proportionally bigger changes on the dependent variable. The idea of cause size is most relevant when the factor is experimental and its levels are quantitative. […] An effect size measure […] is a named expression that maps data, statistics, or parameters onto a quantity that represents the magnitude of the phenomenon of interest. This expression connects dimensions or generalized units that are abstractions of variables of interest with a specific operationalization of those units.”

“A good effect size measure has the [following properties:] […] 1. Its scale (metric) should be appropriate for the research question. […] 2. It should be independent of sample size. […] 3. As a point estimate, an effect size should have good statistical properties; that is, it should be unbiased, consistent […], and efficient […]. 4. The effect size [should be] reported with a confidence interval. […] Not all effect size measures […] have all the properties just listed. But it is possible to report multiple effect sizes that address the same question in order to improve the communication of the results.” 

“Examples of outcomes with meaningful metrics include salaries in dollars and post-treatment survival time in years. means or contrasts for variables with meaningful units are unstandardized effect sizes that can be directly interpreted. […] In medical research, physical measurements with meaningful metrics are often available. […] But in psychological research there are typically no “natural” units for abstract, nonphysical constructs such as intelligence, scholastic achievement, or self-concept. […] Therefore, metrics in psychological research are often arbitrary instead of meaningful. An example is the total score for a set of true-false items. Because responses can be coded with any two different numbers, the total is arbitrary. Standard scores such as percentiles and normal deviates are arbitrary, too […] Standardized effect sizes can be computed for results expressed in arbitrary metrics. Such effect sizes can also be directly compared across studies where outcomes have different scales. this is because standardized effect sizes are based on units that have a common meaning regardless of the original metric.”

“1. It is better to report unstandardized effect sizes for outcomes with meaningful metrics. This is because the original scale is lost when results are standardized. 2. Unstandardized effect sizes are best for comparing results across different samples measured on the same outcomes. […] 3. Standardized effect sizes are better for comparing conceptually similar results based on different units of measure. […] 4. Standardized effect sizes are affected by the corresponding unstandardized effect sizes plus characteristics of the study, including its design […], whether factors are fixed or random, the extent of error variance, and sample base rates. This means that standardized effect sizes are less directly comparable over studies that differ in their designs or samples. […] 5. There is no such thing as T-shirt effect sizes (Lenth, 2006– 2009) that classify standardized effect sizes as “small,” “medium,” or “large” and apply over all research areas. This is because what is considered a large effect in one area may be seen as small or trivial in another. […] 6. There is usually no way to directly translate standardized effect sizes into implications for substantive significance. […] It is standardized effect sizes from sets of related studies that are analyzed in most meta analyses.”

July 16, 2017 Posted by | Books, Psychology, Statistics | Leave a comment

Gravity

“The purpose of this book is to give the reader a very brief introduction to various different aspects of gravity. We start by looking at the way in which the theory of gravity developed historically, before moving on to an outline of how it is understood by scientists today. We will then consider the consequences of gravitational physics on the Earth, in the Solar System, and in the Universe as a whole. The final chapter describes some of the frontiers of current research in theoretical gravitational physics.”

I was not super impressed by this book, mainly because the level of coverage was not quite as high as has been the level of coverage of some of the other physics books in the OUP – A Brief Introduction series. But it’s definitely an okay book about this topic, I was much closer to a three star rating on goodreads than a one star rating, and I did learn some new things from it. I might still change my mind about my two-star rating of the book.

I’ll cover the book the same way I’ve covered some of the other books in the series; I’ll post some quotes with some observations of interest, and then I’ll add some supplementary links towards the end of the post. ‘As usual’ (see e.g. also the introductory remarks to this post) I’ll add links to topics even if I have previously, perhaps on multiple occasions, added the same links when covering other books – the idea behind the links is to remind me – and indicate to you – which kinds of topics are covered in the book.

“[O]ver large distances it is gravity that dominates. This is because gravity is only ever attractive and because it can never be screened. So while most large objects are electrically neutral, they can never be gravitationally neutral. The gravitational force between objects with mass always acts to pull those objects together, and always increases as they become more massive.”

“The challenges involved in testing Newton’s law of gravity in the laboratory arise principally due to the weakness of the gravitational force compared to the other forces of nature. This weakness means that even the smallest residual electric charges on a piece of experimental equipment can totally overwhelm the gravitational force, making it impossible to measure. All experimental equipment therefore needs to be prepared with the greatest of care, and the inevitable electric charges that sneak through have to be screened by introducing metal shields that reduce their influence. This makes the construction of laboratory experiments to test gravity extremely difficult, and explains why we have so far only probed gravity down to scales a little below 1mm (this can be compared to around a billionth of a billionth of a millimetre for the electric force).”

“There are a large number of effects that result from Einstein’s theory. […] [T]he anomalous orbit of the planet Mercury; the bending of starlight around the Sun; the time delay of radio signals as they pass by the Sun; and the behaviour of gyroscopes in orbit around the Earth […] are four of the most prominent relativistic gravitational effects that can be observed in the Solar System.” [As an aside, I only yesterday watched the first ~20 minutes of the first of Nima Arkani-Hamed’s lectures on the topic of ‘Robustness of GR. Attempts to Modify Gravity’, which was recently uploaded on the IAS youtube channel, before I concluded that I was probably not going to be able to follow the lecture – I would have been able to tell Hamed, on account of having read this book, that the name of the ‘American’ astronomer whose name eluded him early on in the lecture (5 minutes in or so) was John Couch Adams (who was in fact British, not American)].

“[T]he overall picture we are left with is very encouraging for Einstein’s theory of gravity. The foundational assumptions of this theory, such as the constancy of mass and the Universality of Free Fall, have been tested to extremely high accuracy. The inverse square law that formed the basis of Newton’s theory, and which is a good first approximation to Einstein’s theory, has been tested from the sub-millimetre scale all the way up to astrophysical scales. […] We […] have very good evidence that Newton’s inverse square law is a good approximation to gravity over a wide range of distance scales. These scales range from a fraction of a millimetre, to hundreds of millions of metres. […] We are also now in possession of a number of accurate experimental results that probe the tiny, subtle effects that result from Einstein’s theory specifically. This data allows us direct experimental insight into the relationship between matter and the curvature of space-time, and all of it is so far in good agreement with Einstein’s predictions.”

“[A]ll of the objects in the Solar System are, relatively speaking, rather slow moving and not very dense. […] If we set our sights a little further though, we can find objects that are much more extreme than anything we have available nearby. […] observations of them have allowed us to explore gravity in ways that are simply impossible in our own Solar System. The extreme nature of these objects amplifies the effects of Einstein’s theory […] Just as the orbit of Mercury precesses around the Sun so too the neutron stars in the Hulse–Taylor binary system precess around each other. To compare with similar effects in our Solar System, the orbit of the Hulse–Taylor pulsar precesses as much in a day as Mercury does in a century.”

“[I]n Einstein’s theory, gravity is due to the curvature of space-time. Massive objects like stars and planets deform the shape of the space-time in which they exist, so that other bodies that move through it appear to have their trajectories bent. It is the mistaken interpretation of the motion of these bodies as occurring in a flat space that leads us to infer that there is a force called gravity. In fact, it is just the curvature of space-time that is at work. […] The relevance of this for gravitational waves is that if a group of massive bodies are in relative motion […], then the curvature of the space-time in which they exist is not usually fixed in time. The curvature of the space-time is set by the massive bodies, so if the bodies are in motion, the curvature of space-time should be expected to be constantly changing. […] in Einstein’s theory, space-time is a dynamical entity. As an example of this, consider the supernovae […] Before their cores collapse, leading to catastrophic explosion, they are relatively stable objects […] After they explode they settle down to a neutron star or a black hole, and once again return to a relatively stable state, with a gravitational field that doesn’t change much with time. During the explosion, however, they eject huge amounts of mass and energy. Their gravitational field changes rapidly throughout this process, and therefore so does the curvature of the space-time around them.

Like any system that is pushed out of equilibrium and made to change rapidly, this causes disturbances in the form of waves. A more down-to-earth example of a wave is what happens when you throw a stone into a previously still pond. The water in the pond was initially in a steady state, but the stone causes a rapid change in the amount of water at one point. The water in the pond tries to return to its tranquil initial state, which results in the propagation of the disturbance, in the form of ripples that move away from the point where the stone landed. Likewise, a loud noise in a previously quiet room originates from a change in air pressure at a point (e.g. a stereo speaker). The disturbance in the air pressure propagates outwards as a pressure wave as the air tries to return to a stable state, and we perceive these pressure waves as sound. So it is with gravity. If the curvature of space-time is pushed out of equilibrium, by the motion of mass or energy, then this disturbance travels outwards as waves. This is exactly what occurs when a star collapses and its outer envelope is ejected by the subsequent explosion. […] The speed with which waves propagate usually depends on the medium through which they travel. […] The medium for gravitational waves is space-time itself, and according to Einstein’s theory, they propagate at exactly the same speed as light. […] [If a gravitational wave passes through a cloud of gas,] the gravitational wave is not a wave in the gas, but rather a propagating disturbance in the space-time in which the gas exists. […] although the atoms in the gas might be closer together (or further apart) than they were before the wave passed through them, it is not because the atoms have moved, but because the amount of space between them has been decreased (or increased) by the wave. The gravitational wave changes the distance between objects by altering how much space there is in between them, not by moving them within a fixed space.”

“If we look at the right galaxies, or collect enough data, […] we can use it to determine the gravitational fields that exist in space. […] we find that there is more gravity than we expected there to be, from the astrophysical bodies that we can see directly. There appears to be a lot of mass, which bends light via its gravitational field, but that does not interact with the light in any other way. […] Moving to even smaller scales, we can look at how individual galaxies behave. It has been known since the 1970s that the rate at which galaxies rotate is too high. What I mean is that if the only source of gravity in a galaxy was the visible matter within it (mostly stars and gas), then any galaxy that rotated as fast as those we see around us would tear itself apart. […] That they do not fly apart, despite their rapid rotation, strongly suggests that the gravitational fields within them are larger than we initially suspected. Again, the logical conclusion is that there appears to be matter in galaxies that we cannot see but which contributes to the gravitational field. […] Many of the different physical processes that occur in the Universe lead to the same surprising conclusion: the gravitational fields we infer, by looking at the Universe around us, require there to be more matter than we can see with our telescopes. Beyond this, in order for the largest structures in the Universe to have evolved into their current state, and in order for the seeds of these structures to look the way they do in the CMB, this new matter cannot be allowed to interact with light at all (or, at most, interact only very weakly). This means that not only do we not see this matter, but that it cannot be seen at all using light, because light is required to pass straight through it. […] The substance that gravitates in this way but cannot be seen is referred to as dark matter. […] There needs to be approximately five times as much dark matter as there is ordinary matter. […] the evidence for the existence of dark matter comes from so many different sources that it is hard to argue with it.”

“[T]here seems to be a type of anti-gravity at work when we look at how the Universe expands. This anti-gravity is required in order to force matter apart, rather than pull it together, so that the expansion of the Universe can accelerate. […] The source of this repulsive gravity is referred to by scientists as dark energy […] our current overall picture of the Universe is as follows: only around 5 per cent of the energy in the Universe is in the form of normal matter; about 25 per cent is thought to be in the form of the gravitationally attractive dark matter; and the remaining 70 per cent is thought to be in the form of the gravitationally repulsive dark energy. These proportions, give or take a few percentage points here and there, seem sufficient to explain all astronomical observations that have been made to date. The total of all three of these types of energy, added together, also seems to be just the right amount to make space flat […] The flat Universe, filled with mostly dark energy and dark matter, is usually referred to as the Concordance Model of the Universe. Among astronomers, it is now the consensus view that this is the model of the Universe that best fits their data.”

 

The universality of free fall.
Galileo’s Leaning Tower of Pisa experiment.
Isaac Newton/Philosophiæ Naturalis Principia Mathematica/Newton’s law of universal gravitation.
Kepler’s laws of planetary motion.
Luminiferous aether.
Special relativity.
Spacetime.
General relativity.
Spacetime curvature.
Pound–Rebka experiment.
Gravitational time dilation.
Gravitational redshift space-probe experiment (Essot & Levine).
Michelson–Morley experiment.
Hughes–Drever experiment.
Tests of special relativity.
Eötvös experiment.
Torsion balance.
Cavendish experiment.
LAGEOS.
Interferometry.
Geodetic precession.
Frame-dragging.
Gravity Probe B.
White dwarf/neutron star/supernova/gravitational collapse/black hole.
Hulse–Taylor binary.
Arecibo Observatory.
PSR J1738+0333.
Gravitational wave.
Square Kilometre Array.
PSR J0337+1715.
LIGO.
Weber bar.
MiniGrail.
Laser Interferometer Space Antenna.
Edwin Hubble/Hubble’s Law.
Physical cosmology.
Alexander Friedmann/Friedmann equations.
Cosmological constant.
Georges Lemaître.
Ralph Asher Alpher/Robert Hermann/CMB/Arno Penzias/Robert Wilson.
Cosmic Background Explorer.
The BOOMERanG experiment.
Millimeter Anisotropy eXperiment IMaging Array.
Wilkinson Microwave Anisotropy Probe.
High-Z Supernova Search Team.
CfA Redshift Survey/CfA2 Great Wall/2dF Galaxy Redshift Survey/Sloan Digital Sky Survey/Sloan Great Wall.
Gravitational lensing.
Inflation (cosmology).
Lambda-CDM model.
BICEP2.
Large Synoptic Survey Telescope.
Grand Unified Theory.
Renormalization (quantum theory).
String theory.
Loop quantum gravity.
Unruh effect.
Hawking radiation.
Anthropic principle.

July 15, 2017 Posted by | Astronomy, Books, cosmology, Physics | Leave a comment

The Personality Puzzle (IV)

Below I have added a few quotes from the last 100 pages of the book. This will be my last post about the book.

“Carol Dweck and her colleagues claim that two […] kinds of goals are […] important […]. One kind she calls judgment goals. Judgment, in this context, refers to seeking to judge or validate an attribute in oneself. For example, you might have the goal of convincing yourself that you are smart, beautiful, or popular. The other kind she calls development goals. A development goal is the desire to actually improve oneself, to become smarter, more beautiful, or more popular. […] From the perspective of Dweck’s theory, these two kinds of goals are important in many areas of life because they produce different reactions to failure, and everybody fails sometimes. A person with a development goal will respond to failure with what Dweck calls a mastery-oriented pattern, in which she tries even harder the next time. […] In contrast, a person with a judgment goal responds to failure with what Dweck calls the helpless pattern: Rather than try harder, this individual simply concludes, “I can’t do it,” and gives up. Of course, that only guarantees more failure in the future. […] Dweck believes [the goals] originate in different kinds of implicit theories about the nature of the world […] Some people hold what Dweck calls entity theories, and believe that personal qualities such as intelligence and ability are unchangeable, leading them to respond helplessly to any indication that they do not have what it takes. Other people hold incremental theories, believing that intelligence and ability can change with time and experience. Their goals, therefore, involve not only proving their competence but increasing it.”

(I should probably add here that any sort of empirical validation of those theories and their consequences are, aside from a brief discussion of the results of a few (likely weak, low-powered) studies, completely absent in the book, but this kind of stuff might even so be worth having in mind, which was why I included this quote in my coverage – US).

“A large amount of research suggests that low self-esteem […] is correlated with outcomes such as dissatisfaction with life, hopelessness, and depression […] as well as loneliness […] Declines in self-esteem also appear to cause outcomes including depression, lower satisfaction with relationships, and lower satisfaction with one’s career […] Your self-esteem tends to suffer when you have failed in the eyes of your social group […] This drop in self-esteem may be a warning about possible rejection or even social ostracism — which, for our distant ancestors, could literally be fatal — and motivate you to restore your reputation. High self-esteem, by contrast, may indicate success and acceptance. Attempts to bolster self-esteem can backfire. […] People who self-enhance — who think they are better than the other people who know them think they are — can run into problems in relations with others, mental health, and adjustment […] Narcissism is associated with high self-esteem that is brittle and unstable because it is unrealistic […], and unstable self-esteem may be worse than low self-esteem […] The bottom line is that promoting psychological health requires something more complex than simply trying to make everybody feel better about themselves […]. The best way to raise self-esteem is through accomplishments that increase it legitimately […]. The most important aspect of your opinion of yourself is not whether it is good or bad, but the degree to which it is accurate.”

“An old theory suggested that if you repeated something over and over in your mind, such rehearsal was sufficient to move the information into long-term memory (LTM), or permanent memory storage. Later research showed that this idea is not quite correct. The best way to get information into LTM, it turns out, is not just to repeat it, but to really think about it (a process called elaboration). The longer and more complex the processing that a piece of information receives, the more likely it is to get transferred into LTM”.

“Concerning mental health, aspects of personality can become so extreme as to cause serious problems. When this happens, psychologists begin to speak of personality disorders […] Personality disorders have five general characteristics. They are (1) unusual and, (2) by definition, tend to cause problems. In addition, most but not quite all personality disorders (3) affect social relations and (4) are stable over time. Finally, (5) in some cases, the person who has a personality disorder may see it not as a disorder at all, but a basic part of who he or she is. […] personality disorders can be ego-syntonic, which means the people who have them do not think anything is wrong. People who suffer from other kinds of mental disorder generally experience their symptoms of confusion, depression, or anxiety as ego-dystonic afflictions of which they would like to be cured. For a surprising number of people with personality disorders, in contrast, their symptoms feel like normal and even valued aspects of who they are. Individuals with the attributes of the antisocial or narcissistic personality disorders, in particular, typically do not think they have a problem.”

[One side-note: It’s important to be aware of the fact that not all people who display unusual behavioral patterns which are causing them problems necessarily suffer from a personality disorder. Other categorization schemes also exist. Autism is for example not categorized as a personality disorder, but is rather considered to be a (neuro)developmental disorder. Funder does not go into this kind of stuff in his book but I thought it might be worth mentioning here – US]

“Some people are more honest than others, but when deceit and manipulation become core aspects of an individual’s way of dealing with the world, he may be diagnosed with antisocial personality disorder. […] People with this disorder are impulsive, and engage in risky behaviors […] They typically are irritable, aggressive, and irresponsible. The damage they do to others bothers them not one whit; they rationalize […] that life is unfair; the world is full of suckers; and if you don’t take what you want whenever you can, then you are a sucker too. […] A wide variety of negative outcomes may accompany this disorder […] Antisocial personality disorder is sometimes confused with the trait of psychopathy […] but it’s importantly different […] Psychopaths are emotionally cold, they disregard social norms, and they are manipulative and often cunning. Most psychopaths meet the criteria for antisocial personality disorder, but the reverse is not true.”

“From day to day with different people, and over time with the same people, most individuals feel and act pretty consistently. […] Predictability makes it possible to deal with others in a reasonable way, and gives each of us a sense of individual identity. But some people are less consistent than others […] borderline personality disorder […] is characterized by unstable and confused behavior, a poor sense of identity, and patterns of self-harm […] Their chaotic thoughts, emotions, and behaviors make persons suffering from this disorder very difficult for others to “read” […] Borderline personality disorder (BPD) entails so many problems for the affected person that nobody doubts that it is, at the very least, on the “borderline” with severe psychopathology.5 Its hallmark is emotional instability. […] All of the personality disorders are rather mixed bags of indicators, and BPD may be the most mixed of all. It is difficult to find a coherent, common thread among its characteristics […] Some psychologists […] have suggested that this [personality disorder] category is too diffuse and should be abandoned.”

“[T]he modern research literature on personality disorders has come close to consensus about one conclusion: There is no sharp dividing line between psychopathology and normal variation (L. A. Clark & Watson, 1999a; Furr & Funder, 1998; Hong & Paunonen, 2011; Krueger & Eaton, 2010; Krueger & Tackett, 2003; B. P. O’Connor, 2002; Trull & Durrett, 2005).”

“Accurate self-knowledge has long been considered a hallmark of mental health […] The process for gaining accurate self-knowledge is outlined by the Realistic Accuracy Model […] according to RAM, one can gain accurate knowledge of anyone’s personality through a four-stage process. First, the person must do something relevant to the trait being judged; second, the information must be available to the judge; third, the judge must detect this information; and fourth, the judge must utilize the information correctly. This model was initially developed to explain the accuracy of judgments of other people. In an important sense, though, you are just one of the people you happen to know, and, to some degree, you come to know yourself the same way you find out about anybody else — by observing what you do and trying to draw appropriate conclusions”.

“[P]ersonality is not just something you have; it is also something you do. The unique aspects of what you do comprise the procedural self, and your knowledge of this self typically takes the form of procedural knowledge. […] The procedural self is made up of the behaviors through which you express who you think you are, generally without knowing you are doing so […]. Like riding a bicycle, the working of the procedural self is automatic and not very accessible to conscious awareness.”

July 14, 2017 Posted by | Books, Psychology | Leave a comment

Words

Almost all of the words included below are words which I encountered while reading the Rex Stout books: Too Many Cooks, Some Buried Caesar, Over My Dead Body, Where There’s A Will, Black Orchids, Not Quite Dead Enough, The Silent Speaker, and Too Many Women.

Consilience. Plerophory. Livery. Fleshpot. Electioneer. Estop. Gibbosity. Piroshki. Clodhopper. Phlebotomy. Concordat. Clutch. Katydid. Tarpon. Bower. Scoot. Suds. Rotunda. Gab. Floriculture.

Scowl. Commandeer. Apodictically. Blotch. Bauble. Thurl. Wilt. Huff. Clodhopper. Consignee. Épée. Imprecation. Intransigent. Couturier. Quittance. Dingus. MetonymyChintzy. Skittish. Natty.

Intrigante. Curlicue. Bedraggled. Rotogravure. Legatee. Caper. Phiz. Derrick. Labellum. Mumblety-peg. Flump. Kerplunk. Portage. Pettish. Darb. Partridge. Cheviot. Jaunty. Accouterment. Obreptitious.

Receptacle. Impetuous. Springe. Toting. Blowsy. Flam. Linnet. Carton. Bollix. Awning. Chiffonier. Sniggle. Toggle. Craw. Simp. Titter. Wren. Endive. Assiduity. Pudgy.

July 12, 2017 Posted by | Books, language | Leave a comment

Beyond Significance Testing (II)

I have added some more quotes and observations from the book below.

“The least squares estimators M and s2 are not robust against the effects of extreme scores. […] Conventional methods to construct confidence intervals rely on sample standard deviations to estimate standard errors. These methods also rely on critical values in central test distributions, such as t and z, that assume normality or homoscedasticity […] Such distributional assumptions are not always plausible. […] One option to deal with outliers is to apply transformations, which convert original scores with a mathematical operation to new ones that may be more normally distributed. The effect of applying a monotonic transformation is to compress one part of the distribution more than another, thereby changing its shape but not the rank order of the scores. […] It can be difficult to find a transformation that works in a particular data set. Some distributions can be so severely nonnormal that basically no transformation will work. […] An alternative that also deals with departures from distributional assumptions is robust estimation. Robust (resistant) estimators are generally less affected than least squares estimators by outliers or nonnormality.”

“An estimator’s quantitative robustness can be described by its finite-sample breakdown point (BP), or the smallest proportion of scores that when made arbitrarily very large or small renders the statistic meaningless. The lower the value of BP, the less robust the estimator. For both M and s2, BP = 0, the lowest possible value. This is because the value of either statistic can be distorted by a single outlier, and the ratio 1/N approaches zero as sample size increases. In contrast, BP = .50 for the median because its value is not distorted by arbitrarily extreme scores unless they make up at least half the sample. But the median is not an optimal estimator because its value is determined by a single score, the one at the 50th percentile. In this sense, all the other scores are discarded by the median. A compromise between the sample mean and the median is the trimmed mean. A trimmed mean Mtr is calculated by (a) ordering the scores from lowest to highest, (b) deleting the same proportion of the most extreme scores from each tail of the distribution, and then (c) calculating the average of the scores that remain. […] A common practice is to trim 20% of the scores from each tail of the distribution when calculating trimmed estimators. This proportion tends to maintain the robustness of trimmed means while minimizing their standard errors when sampling from symmetrical distributions […] For 20% trimmed means, BP = .20, which says they are robust against arbitrarily extreme scores unless such outliers make up at least 20% of the sample.”

The standard H0 is both a point hypothesis and a nil hypothesis. A point hypothesis specifies the numerical value of a parameter or the difference between two or more parameters, and a nil hypothesis states that this value is zero. The latter is usually a prediction that an effect, difference, or association is zero. […] Nil hypotheses as default explanations may be fine in new research areas when it is unknown whether effects exist at all. But they are less suitable in established areas when it is known that some effect is probably not zero. […] Nil hypotheses are tested much more often than non-nil hypotheses even when the former are implausible. […] If a nil hypothesis is implausible, estimated probabilities of data will be too low. This means that risk for Type I error is basically zero and a Type II error is the only possible kind when H0 is known in advance to be false.”

“Too many researchers treat the conventional levels of α, either .05 or .01, as golden rules. If other levels of α are specifed, they tend to be even lower […]. Sanctification of .05 as the highest “acceptable” level is problematic. […] Instead of blindly accepting either .05 or .01, one does better to […] [s]pecify a level of α that reflects the desired relative seriousness (DRS) of Type I error versus Type II error. […] researchers should not rely on a mechanical ritual (i.e., automatically specify .05 or .01) to control risk for Type I error that ignores the consequences of Type II error.”

“Although p and α are derived in the same theoretical sampling distribution, p does not estimate the conditional probability of a Type I error […]. This is because p is based on a range of results under H0, but α has nothing to do with actual results and is supposed to be specified before any data are collected. Confusion between p and α is widespread […] To differentiate the two, Gigerenzer (1993) referred to p as the exact level of significance. If p = .032 and α = .05, H0 is rejected at the .05 level, but .032 is not the long-run probability of Type I error, which is .05 for this example. The exact level of significance is the conditional probability of the data (or any result even more extreme) assuming H0 is true, given all other assumptions about sampling, distributions, and scores. […] Because p values are estimated assuming that H0 is true, they do not somehow measure the likelihood that H0 is correct. […] The false belief that p is the probability that H0 is true, or the inverse probability error […] is widespread.”

“Probabilities from significance tests say little about effect size. This is because essentially any test statistic (TS) can be expressed as the product TS = ES × f(N) […] where ES is an effect size and f(N) is a function of sample size. This equation explains how it is possible that (a) trivial effects can be statistically significant in large samples or (b) large effects may not be statistically significant in small samples. So p is a confounded measure of effect size and sample size.”

“Power is the probability of getting statistical significance over many random replications when H1 is true. it varies directly with sample size and the magnitude of the population effect size. […] This combination leads to the greatest power: a large population effect size, a large sample, a higher level of α […], a within-subjects design, a parametric test rather than a nonparametric test (e.g., t instead of Mann–Whitney), and very reliable scores. […] Power .80 is generally desirable, but an even higher standard may be need if consequences of Type II error are severe. […] Reviews from the 1970s and 1980s indicated that the typical power of behavioral science research is only about .50 […] and there is little evidence that power is any higher in more recent studies […] Ellis (2010) estimated that < 10% of studies have samples sufficiently large to detect smaller population effect sizes. Increasing sample size would address low power, but the number of additional cases necessary to reach even nominal power when studying smaller effects may be so great as to be practically impossible […] Too few researchers, generally < 20% (Osborne, 2008), bother to report prospective power despite admonitions to do so […] The concept of power does not stand without significance testing. as statistical tests play a smaller role in the analysis, the relevance of power also declines. If significance tests are not used, power is irrelevant. Cumming (2012) described an alternative called precision for research planning, where the researcher specifies a target margin of error for estimating the parameter of interest. […] The advantage over power analysis is that researchers must consider both effect size and precision in study planning.”

“Classical nonparametric tests are alternatives to the parametric t and F tests for means (e.g., the Mann–Whitney test is the nonparametric analogue to the t test). Nonparametric tests generally work by converting the original scores to ranks. They also make fewer assumptions about the distributions of those ranks than do parametric tests applied to the original scores. Nonparametric tests date to the 1950s–1960s, and they share some limitations. One is that they are not generally robust against heteroscedasticity, and another is that their application is typically limited to single-factor designs […] Modern robust tests are an alternative. They are generally more flexible than nonparametric tests and can be applied in designs with multiple factors. […] At the end of the day, robust statistical tests are subject to many of the same limitations as other statistical tests. For example, they assume random sampling albeit from population distributions that may be nonnormal or heteroscedastic; they also assume that sampling error is the only source of error variance. Alternative tests, such as the Welch–James and Yuen–Welch versions of a robust t test, do not always yield the same p value for the same data, and it is not always clear which alternative is best (Wilcox, 2003).”

July 11, 2017 Posted by | Books, Psychology, Statistics | Leave a comment

The Personality Puzzle (III)

I have added some more quotes and observations from the book below.

“Across many, many traits, the average correlation across MZ twins is about .60, and across DZ twins it is about .40, when adjusted for age and gender […] This means that, according to twin studies, the average heritability of many traits is about .40, which is interpreted to mean that 40 percent of phenotypic (behavioral) variance is accounted for by genetic variance. The heritabilities of the Big Five traits are a bit higher; according to one comprehensive summary they range from .42, for agreeableness, to .57, for openness (Bouchard, 2004). […] behavioral genetic analyses and the statistics they produce refer to groups or populations, not individuals. […] when research concludes that a personality trait is, say, 50 percent heritable, this does not mean that half of the extent to which an individual expresses that trait is determined genetically. Instead, it means that 50 percent of the degree to which the trait varies across the population can be attributed to genetic variation. […] Because heritability is the proportion of variation due to genetic influences, if there is no variation, then the heritability must approach zero. […] Heritability statistics are not the nature-nurture ratio; a biologically determined trait can have a zero heritability.”

The environment can […] affect heritability […]. For example, when every child receives adequate nutrition, variance in height is genetically controlled. […] But in an environment where some are well fed while others go hungry, variance in height will fall more under the control of the environment. Well-fed children will grow near the maximum of their genetic potential while poorly fed children will grow closer to their genetic minimum, and the height of the parents will not matter so much; the heritability coeffcient for height will be much closer to 0. […] A trait that is adaptive in one situation may be harmful in another […] the same environments that promote good outcomes for some people can promote bad outcomes for others, and vice versa […] More generally, the same circumstances might be experienced as stressful, enjoyable, or boring, depending on the genetic predispositions of the individuals involved; these variations in experience can lead to very different behaviors and, over time, to the development of different personality traits.”

Mihalyi Csikszentmihalyi [argued] that the best way a person can spend time is in autotelic activities, those that are enjoyable for their own sake. The subjective experience of an autotelic activity — the enjoyment itself — is what Csikszentmihalyi calls flow.
Flow is not the same thing as joy, happiness, or other, more familiar terms for subjective well-being. Rather, the experience of flow is characterized by tremendous concentration, total lack of distractibility, and thoughts concerning only the activity at hand. […] Losing track of time is one sign of experiencing flow. According to Csikszentmihalyi, flow arises when the challenges an activity presents are well matched with your skills. If an activity is too diffcult or too confusing, you will experience anxiety, worry, and frustration. If the activity is too easy, you will experience boredom and (again) anxiety. But when skills and challenges are balanced, you experience flow. […] Csikszentmihalyi thinks that the secret for enhancing your quality of life is to spend as much time in flow as possible. Achieving flow entails becoming good at something you find worthwhile and enjoyable. […] Even in the best of circumstances [however], flow seems to describe a rather solitary kind of happiness. […] The drawback with flow is that somebody experiencing it can be difficult to interact with”. [I really did not like most of the stuff included in the part of the book from which this quote is taken, but I did find Csikszentmihalyi’s flow concept quite interesting.]

“About 80 percent of the participants in psychological research come from countries that are Western, Educated, Industrialized, Rich, and Democratic — ”WEIRD” in other words — although only 12 percent of the world’s population live there (Henrich et al., 2010).”

“If an animal or a person performs a behavior, and the behavior is followed by a good result — a reinforcement — the behavior becomes more likely. If the behavior is followed by a punishment, it becomes less likely. […] the results of operant conditioning are not necessarily logical. It can increase the frequency of any behavior, regardless of its real connection with the consequences that follow.”

“A punishment is an aversive consequence that follows an act in order to stop it and prevent its repetition. […] Many people believe the only way to stop or prevent somebody from doing something is punishment. […] You can [however] use reward for this purpose too. All you have to do is find a response that is incompatible with the one you are trying to get rid of, and reward that incompatible response instead. Reward a child for reading instead of punishing him for watching television. […] punishment works well when it is done right. The only problem is, it is almost never done right. […] One way to see how punishment works, or fails to work, is to examine the rules for applying it correctly. The classic behaviorist analysis says that five principles are most important […] 1. Availability of Alternatives: An alternative response to the behavior that is being punished must be available. This alternative response must not be punished and should be rewarded. […] 2. Behavioral and Situational Specificity: Be clear about exactly what behavior you are punishing and the circumstances under which it will and will not be punished. […] 3. Timing and Consistency: To be effective, a punishment needs to be applied immediately after the behavior you wish to prevent, every time that behavior occurs. Otherwise, the person (or animal) being punished may not understand which behavior is forbidden. […] 4. Conditioning Secondary Punishing Stimuli: One can lessen the actual use of punishment by conditioning secondary stimuli to it [such as e.g.  verbal warnings] […] 5. Avoiding Mixed Messages: […] Sometimes, after punishing a child, the parent feels so guilty that she picks the child up for a cuddle. This is a mistake. The child might start to misbehave just to get the cuddle that follows the punishment. Punish if you must punish, but do not mix your message. A variant on this problem occurs when the child learns to play one parent against the other. For example, after the father punishes the child, the child goes to the mother for sympathy, or vice versa. This can produce the same counterproductive result.”

Punishment will backfire unless all of the guidelines [above] are followed. Usually, they are not. A punisher has to be extremely careful, for several reasons. […] The first and perhaps most important danger of punishment is that it creates emotion. […] powerful emotions are not conducive to clear thinking. […] Punishment [also] tends to vary with the punisher’s mood, which is one reason it is rarely applied consistently. […] Punishment [furthermore] [m]otivates [c]oncealment: The prospective punishee has good reasons to conceal behavior that might be punished. […] Rewards have the reverse effect. When workers anticipate rewards for good work instead of punishment for bad work, they are naturally motivated to bring to the boss’s attention everything they are doing, in case it merits reward.”

Gordon Allport observed years ago [that] [“]For some the world is a hostile place where men are evil and dangerous; for others it is a stage for fun and frolic. It may appear as a place to do one’s duty grimly; or a pasture for cultivating friendship and love.[“] […] people with different traits see the world differently. This perception affects how they react to the events in their lives which, in turn, affects what they do. […] People [also] differ in the emotions they experience, the emotions they want to experience, how strongly they experience emotions, how frequently their emotions change, and how well they understand and control their emotions.”

July 9, 2017 Posted by | Books, Genetics, Psychology | Leave a comment

Beyond Significance Testing (I)

“This book introduces readers to the principles and practice of statistics reform in the behavioral sciences. it (a) reviews the now even larger literature about shortcomings of significance testing; (b) explains why these criticisms have sufficient merit to justify major changes in the ways researchers analyze their data and report the results; (c) helps readers acquire new skills concerning interval estimation and effect size estimation; and (d) reviews alternative ways to test hypotheses, including Bayesian estimation. […] I assume that the reader has had undergraduate courses in statistics that covered at least the basics of regression and factorial analysis of variance. […] This book is suitable as a textbook for an introductory course in behavioral science statistics at the graduate level.”

I’m currently reading this book. I have so far read 8 out of the 10 chapters included, and I’m currently sort of hovering between a 3 and 4 star goodreads rating; some parts of the book are really great, but there are also a few aspects I don’t like. Some parts of the coverage are rather technical and I’m still debating to which extent I should cover the technical stuff in detail later here on the blog; there are quite a few equations included in the book and I find it annoying to cover math using the wordpress format of this blog. For now I’ll start out with a reasonably non-technical post with some quotes and key ideas from the first parts of the book.

“In studies of intervention outcomes, a statistically significant difference between treated and untreated cases […] has nothing to do with whether treatment leads to any tangible benefits in the real world. In the context of diagnostic criteria, clinical significance concerns whether treated cases can no longer be distinguished from control cases not meeting the same criteria. For example, does treatment typically prompt a return to normal levels of functioning? A treatment effect can be statistically significant yet trivial in terms of its clinical significance, and clinically meaningful results are not always statistically significant. Accordingly, the proper response to claims of statistical significance in any context should be “so what?” — or, more pointedly, “who cares?” — without more information.”

“There are free computer tools for estimating power, but most researchers — probably at least 80% (e.g., Ellis, 2010) — ignore the power of their analyses. […] Ignoring power is regrettable because the median power of published nonexperimental studies is only about .50 (e.g., Maxwell, 2004). This implies a 50% chance of correctly rejecting the null hypothesis based on the data. In this case the researcher may as well not collect any data but instead just toss a coin to decide whether or not to reject the null hypothesis. […] A consequence of low power is that the research literature is often difficult to interpret. Specifically, if there is a real effect but power is only .50, about half the studies will yield statistically significant results and the rest will yield no statistically significant findings. If all these studies were somehow published, the number of positive and negative results would be roughly equal. In an old-fashioned, narrative review, the research literature would appear to be ambiguous, given this balance. It may be concluded that “more research is needed,” but any new results will just reinforce the original ambiguity, if power remains low.”

“Statistical tests of a treatment effect that is actually clinically significant may fail to reject the null hypothesis of no difference when power is low. If the researcher in this case ignored whether the observed effect size is clinically significant, a potentially beneficial treatment may be overlooked. This is exactly what was found by Freiman, Chalmers, Smith, and Kuebler (1978), who reviewed 71 randomized clinical trials of mainly heart- and cancer-related treatments with “negative” results (i.e., not statistically significant). They found that if the authors of 50 of the 71 trials had considered the power of their tests along with the observed effect sizes, those authors should have concluded just the opposite, or that the treatments resulted in clinically meaningful improvements.”

“Even if researchers avoided the kinds of mistakes just described, there are grounds to suspect that p values from statistical tests are simply incorrect in most studies: 1. They (p values) are estimated in theoretical sampling distributions that assume random sampling from known populations. Very few samples in behavioral research are random samples. Instead, most are convenience samples collected under conditions that have little resemblance to true random sampling. […] 2. Results of more quantitative reviews suggest that, due to assumptions violations, there are few actual data sets in which significance testing gives accurate results […] 3. Probabilities from statistical tests (p values) generally assume that all other sources of error besides sampling error are nil. This includes measurement error […] Other sources of error arise from failure to control for extraneous sources of variance or from flawed operational definitions of hypothetical constructs. It is absurd to assume in most studies that there is no error variance besides sampling error. Instead it is more practical to expect that sampling error makes up the small part of all possible kinds of error when the number of cases is reasonably large (Ziliak & mcCloskey, 2008).”

“The p values from statistical tests do not tell researchers what they want to know, which often concerns whether the data support a particular hypothesis. This is because p values merely estimate the conditional probability of the data under a statistical hypothesis — the null hypothesis — that in most studies is an implausible, straw man argument. In fact, p values do not directly “test” any hypothesis at all, but they are often misinterpreted as though they describe hypotheses instead of data. Although p values ultimately provide a yes-or-no answer (i.e., reject or fail to reject the null hypothesis), the question — p < a?, where a is the criterion level of statistical significance, usually .05 or .01 — is typically uninteresting. The yes-or-no answer to this question says nothing about scientific relevance, clinical significance, or effect size. […] determining clinical significance is not just a matter of statistics; it also requires strong knowledge about the subject matter.”

“[M]any null hypotheses have little if any scientific value. For example, Anderson et al. (2000) reviewed null hypotheses tested in several hundred empirical studies published from 1978 to 1998 in two environmental sciences journals. They found many implausible null hypotheses that specified things such as equal survival probabilities for juvenile and adult members of a species or that growth rates did not differ across species, among other assumptions known to be false before collecting data. I am unaware of a similar survey of null hypotheses in the behavioral sciences, but I would be surprised if the results would be very different.”

“Hoekstra, Finch, Kiers, and Johnson (2006) examined a total of 266 articles published in Psychonomic Bulletin & Review during 2002–2004. Results of significance tests were reported in about 97% of the articles, but confidence intervals were reported in only about 6%. Sadly, p values were misinterpreted in about 60% of surveyed articles. Fidler, Burgman, Cumming, Buttrose, and Thomason (2006) sampled 200 articles published in two different biology journals. Results of significance testing were reported in 92% of articles published during 2001–2002, but this rate dropped to 78% in 2005. There were also corresponding increases in the reporting of confidence intervals, but power was estimated in only 8% and p values were misinterpreted in 63%. […] Sun, Pan, and Wang (2010) reviewed a total of 1,243 works published in 14 different psychology and education journals during 2005–2007. The percentage of articles reporting effect sizes was 49%, and 57% of these authors interpreted their effect sizes.”

“It is a myth that the larger the sample, the more closely it approximates a normal distribution. This idea probably stems from a misunderstanding of the central limit theorem, which applies to certain group statistics such as means. […] This theorem justifies approximating distributions of random means with normal curves, but it does not apply to distributions of scores in individual samples. […] larger samples do not generally have more normal distributions than smaller samples. If the population distribution is, say, positively skewed, this shape will tend to show up in the distributions of random samples that are either smaller or larger.”

“A standard error is the standard deviation in a sampling distribution, the probability distribution of a statistic across all random samples drawn from the same population(s) and with each sample based on the same number of cases. It estimates the amount of sampling error in standard deviation units. The square of a standard error is the error variance. […] Variability of the sampling distributions […] decreases as the sample size increases. […] The standard error sM, which estimates variability of the group statistic M, is often confused with the standard deviation s, which measures variability at the case level. This confusion is a source of misinterpretation of both statistical tests and confidence intervals […] Note that the standard error sM itself has a standard error (as do standard errors for all other kinds of statistics). This is because the value of sM varies over random samples. This explains why one should not overinterpret a confidence interval or p value from a significance test based on a single sample.”

“Standard errors estimate sampling error under random sampling. What they measure when sampling is not random may not be clear. […] Standard errors also ignore […] other sources of error [:] 1. Measurement error [which] refers to the difference between an observed score X and the true score on the underlying construct. […] Measurement error reduces absolute effect sizes and the power of statistical tests. […] 2. Construct definition error [which] involves problems with how hypothetical constructs are defined or operationalized. […] 3. Specification error [which] refers to the omission from a regression equation of at least one predictor that covaries with the measured (included) predictors. […] 4. Treatment implementation error occurs when an intervention does not follow prescribed procedures. […] Gosset used the term real error to refer all types of error besides sampling error […]. In reasonably large samples, the impact of real error may be greater than that of sampling error.”

“The technique of bootstrapping […] is a computer-based method of resampling that recombines the cases in a data set in different ways to estimate statistical precision, with fewer assumptions than traditional methods about population distributions. Perhaps the best known form is nonparametric bootstrapping, which generally makes no assumptions other than that the distribution in the sample reflects the basic shape of that in the population. It treats your data file as a pseudo-population in that cases are randomly selected with replacement to generate other data sets, usually of the same size as the original. […] The technique of nonparametric bootstrapping seems well suited for interval estimation when the researcher is either unwilling or unable to make a lot of assumptions about population distributions. […] potential limitations of nonparametric bootstrapping: 1. Nonparametric bootstrapping simulates random sampling, but true random sampling is rarely used in practice. […] 2. […] If the shape of the sample distribution is very different compared with that in the population, results of nonparametric bootstrapping may have poor external validity. 3. The “population” from which bootstrapped samples are drawn is merely the original data file. If this data set is small or the observations are not independent, resampling from it will not somehow fix these problems. In fact, resampling can magnify the effects of unusual features in a small data set […] 4. Results of bootstrap analyses are probably quite biased in small samples, but this is true of many traditional methods, too. […] [In] parametric bootstrapping […] the researcher specifies the numerical and distributional properties of a theoretical probability density function, and then the computer randomly samples from that distribution. When repeated many times by the computer, values of statistics in these synthesized samples vary randomly about the parameters specified by the researcher, which simulates sampling error.”

July 9, 2017 Posted by | Books, Psychology, Statistics | Leave a comment

The Personality Puzzle (II)

I have added some more quotes and observations from the book below. Some of the stuff covered in this post is very closely related to material I’ve previously covered on the blog, e.g. here and here, but I didn’t mind reviewing this stuff here. If you’re already familiar with Funder’s RAM model of personality judgment you can probably skip the last half of the post without missing out on anything.

“[T]he trait approach [of personality psychology] focuses exclusively on individual differences. It does not attempt to measure how dominant, sociable, or nervous anybody is in an absolute sense; there is no zero point on any dominance scale or on any measure of any other trait. Instead, the trait approach seeks to measure the degree to which a person might be more or less dominant, sociable, or nervous than someone else. (Technically, therefore, trait measurements are made on ordinal rather than ratio scales.) […] Research shows that the stability of the differences between people increases with age […] According to one major summary of the literature, the correlation coefficient reflecting consistency of individual differences in personality is .31 across childhood, .54 during the college years, and .74 between the ages of 50 and 70 […] The main reason personality becomes more stable during the transition from child to adult to senior citizen seems to be that one’s environment also gets more stable with age […] According to one major review, longitudinal data show that, on average, people tend to become more socially dominant, agreeable, conscientious, and emotionally stable (lower on neuroticism) over time […] [However] people differ from each other in the degree to which they have developed a consistent personality […] Several studies suggest that the consistency of personality is associated with maturity and general mental health […] More-consistent people appear to be less neurotic, more controlled, more mature, and more positive in their relations with others (Donnellan, Conger, & Burzette, 2007; Roberts, Caspi, & Mofftt, 2001; Sherman, Nave, & Funder, 2010).”

“Despite the evidence for the malleability of personality […], it would be a mistake to conclude that change is easy. […] most people like their personalities pretty much the way they are, and do not see any reason for drastic change […] Acting in a way contrary to one’s traits takes effort and can be exhausting […] Second, people have a tendency to blame negative experiences and failures on external forces rather than recognizing the role of their own personality. […] Third, people generally like their lives to be consistent and predictable […] Change requires learning new skills, going new places, meeting new people, and acting in unaccustomed ways. That can make it uncomfortable. […] personality change has both a downside and an upside. […] people tend to like others who are “judgeable,” who are easy to understand, predict, and relate to. But when they don’t know what to expect or how to predict what a person will do, they are more likely to avoid that person. […] Moreover, if one’s personality is constantly changing, then it will be difficult to choose consistent goals that can be pursued over the long term.”

“There is no doubt that people change their behavior from one situation to the next. This obvious fact has sometimes led to the misunderstanding that personality consistency somehow means “acting the same way all the time.” But that’s not what it means at all. […] It is individual differences in behavior that are maintained across situations, not how much a behavior is performed. […] as the effect of the situation gets stronger, the effect of the person tends to get weaker, and vice versa. […] any fair reading of the research literature make one thing abundantly clear: When it comes to personality, one size does not fit all. People really do act differently from each other. Even when they are all in the same situation, some individuals will be more sociable, nervous, talkative, or active than others. And when the situation changes, those differences will still be there […] the evidence is overwhelming that people are psychologically different from one another, that personality traits exist, that people’s impressions of each other’s personalities are based on reality more than cognitive error, and that personality traits affect important life outcomes […] it is […] important to put the relative role of personality traits and situations into perspective. Situational variables are relevant to how people will act under specific circumstances. Personality traits are better for describing how people act in general […] A sad legacy of the person-situation debate is that many psychologists became used to thinking of the person and the situation as opposing forces […] It is much more accurate to see persons and situations as constantly interacting to produce behavior together. […] Persons and situations interact in three major ways […] First, the effect of a personality variable may depend on the situation, or vice versa. […] Certain types of people go to or find themselves in different types of situations. This is the second kind of person-situation interaction. […] The third kind of interaction stems from the way people change situations by virtue of what they do in them”.

“Shy people are often lonely and may deeply wish to have friends and normal social interactions, but are so fearful of the process of social involvement that they become isolated. In some cases, they won’t ask for help when they need it, even when someone who could easily solve their problem is nearby […]. Because shy people spend a lot of time by themselves, they deny themselves the opportunity to develop normal social skills. When they do venture out, they are so out of practice they may not know how to act. […] A particular problem for shy people is that, typically, others do not perceive them as shy. Instead, to most observers, they seem cold and aloof. […] shy people generally are not cold and aloof, or at least they do not mean to be. But that is frequently how they are perceived. That perception, in turn, affects the lives of shy people in important negative ways and is part of a cycle that perpetuates shyness […] the judgments of others are an important part of the social world and can have a significant effect on personality and life. […] Judgments of others can also affect you through “self-fulfilling prophecies,” more technically known as expectancy effects.1 These effects can affect both intellectual performance and social behavior.”

“Because people constantly make personality judgments, and because these judgments are consequential, it would seem important to know when and to what degree these judgments are accurate. […] [One relevant] method is called convergent validation. […] Convergent validation is achieved by assembling diverse pieces of information […] that “converge” on a common conclusion […] The more items of diverse information that converge, the more confident the conclusion […] For personality judgments, the two primary converging criteria are interjudge agreement and behavioral prediction. […] psychological research can evaluate personality judgments by asking two questions […] (1) Do the judgments agree with one another? (2) Can they predict behavior? To the degree the answers are Yes, the judgments are probably accurate.”

“In general, judges [of personality] will reach more accurate conclusions if the behaviors they observe are closely related to the traits they are judging. […] A moderator of accuracy […] is a variable that changes the correlation between a judgment and its criterion. Research on accuracy has focused primarily on four potential moderators: properties (1) of the judge, (2) of the target (the person who is judged), (3) of the trait that is judged, and (4) of the information on which the judgment is based. […] Do people know whether they are good judges of personality? The answer appears to be both no and yes […]. No, because people who describe themselves as good judges, in general, are no better than those who rate themselves as poorer in judgmental ability. But the answer is yes, in another sense. When asked which among several acquaintances they can judge most accurately, most people are mostly correct. In other words, we can tell the difference between people who we can and cannot judge accurately. […] Does making an extra effort to be accurate help? Research results so far are mixed.”

“When it comes to accurate judgment, who is being judged might be even more important than who is doing the judging. […] People differ quite a lot in how accurately they can be judged. […] “Judgable” people are those about whom others reach agreement most easily, because they are the ones whose behavior is most predictable from judgments of their personalities […] The behavior of judgable people is organized coherently; even acquaintances who know them in separate settings describe essentially the same person. Furthermore, the behavior of such people is consistent; what they do in the future can be predicted from what they have done in the past. […] Theorists have long postulated that it is psychologically healthy to conceal as little as possible from those around you […]. If you exhibit a psychological façade that produces large discrepancies between the person “inside” and the person you display “outside,” you may feel isolated from the people around you, which can lead to unhappiness, hostility, and depression. Acting in a way that is contrary to your real personality takes effort, and can be psychologically tiring […]. Evidence even suggests that concealing your emotions may be harmful to physical health“.

“All traits are not created equal — some are much easier to judge accurately than others. For example, more easily observed traits, such as “talkativeness,” “sociability,” and other traits related to extraversion, are judged with much higher levels of interjudge agreement than are less visible traits, such as cognitive and ruminative styles and habits […] To find out about less visible, more internal traits like beliefs or tendencies to worry, self-reports […] are more informative […] [M]ore information is usually better, especially when judging certain traits. […] Quantity is not the only important variable concerning information. […] it can be far more informative to observe a person in a weak situation, in which different people do different things, than in a strong situation, in which social norms restrict what people do […] The best situation for judging someone’s personality is one that brings out the trait you want to judge. To evaluate a person’s approach toward his work, the best thing to do is to observe him working. To evaluate a person’s sociability, observations at a party would be more informative […] The accurate judgment of personality, then, depends on both the quantity and the quality of the information on which it is based. More information is generally better, but it is just as important for the information to be relevant to the traits that one is trying to judge.”

“In order to get from an attribute of an individual’s personality to an accurate judgment of that trait, four things must happen […]. First, the person being judged must do something relevant; that is, informative about the trait to be judged. Second, this information must be available to a judge. Third, this judge must detect this information. Fourth and fnally, the judge must utilize this information correctly. […] If the process fails at any step — the person in question never does something relevant, or does it out of sight of the judge, or the judge doesn’t notice, or the judge makes an incorrect interpretation — accurate personality judgment will fail. […] Traditionally, efforts to improve accuracy have focused on attempts to get judges to think better, to use good logic and avoid inferential errors. These efforts are worthwhile, but they address only one stage — utilization — out of the four stages of accurate personality judgment. Improvement could be sought at the other stages as well […] Becoming a better judge of personality […] involves much more than “thinking better.” You should also try to create an interpersonal environment where other people can be themselves and where they feel free to let you know what is really going on.”

July 5, 2017 Posted by | Books, Psychology | Leave a comment

The Antarctic

“A very poor book with poor coverage, mostly about politics and history (and a long collection of names of treaties and organizations). I would definitely not have finished it if it were much longer than it is.”

That was what I wrote about the book in my goodreads review. I was strongly debating whether or not to blog it at all, but I decided in the end to just settle for some very lazy coverage of the book, only consisting of links to content covered in the book. I only cover the book here to at least have some chance of remembering which kinds of things were covered in the book later on.

If you’re interested enough in the Antarctic to read a book about it, read Scott’s Last Expedition instead of this one (here’s my goodreads review of Scott).

Links:

Antarctica (featured).
Antarctic Convergence.
Antarctic Circle.
Southern Ocean.
Antarctic Circumpolar Current.
West Antarctic Ice Sheet.
East Antarctic Ice Sheet.
McMurdo Dry Valleys.
Notothenioidei.
Patagonian toothfish.
Antarctic krill.
Fabian Gottlieb von Bellingshausen.
Edward Bransfield.
James Clark Ross.
United States Exploring Expedition.
Heroic Age of Antarctic Exploration (featured).
Nimrod Expedition (featured).
Roald Amundsen.
Wilhelm Filchner.
Japanese Antarctic Expedition.
Terra Nova Expedition (featured).
Lincoln Ellsworth.
British Graham Land expedition.
German Antarctic Expedition (1938–1939).
Operation Highjump.
Operation Windmill.
Operation Deep Freeze.
Commonwealth Trans-Antarctic Expedition.
Caroline Mikkelsen.
International Association of Antarctica Tour Operators.
Territorial claims in Antarctica.
International Geophysical Year.
Antarctic Treaty System.
Operation Tabarin.
Scientific Committee on Antarctic Research.
United Nations Convention on the Law of the Sea.
Convention on the Continental Shelf.
Council of Managers of National Antarctic Programs.
British Antarctic Survey.
International Polar Year.
Antarctic ozone hole.
Gamburtsev Mountain Range.
Pine Island Glacier (‘good article’).
Census of Antarctic Marine Life.
Lake Ellsworth Consortium.
Antarctic fur seal.
Southern elephant seal.
Grytviken (whaling-related).
International Convention for the Regulation of Whaling.
International Whaling Commission.
Ocean Drilling Program.
Convention on the Regulation of Antarctic Mineral Resource Activities.
Agreement on the Conservation of Albatrosses and Petrels.

July 3, 2017 Posted by | Biology, Books, Geography, Geology, History, Wikipedia | Leave a comment

Stars

“Every atom of our bodies has been part of a star, and every informed person should know something of how the stars evolve.”

I gave the book three stars on goodreads. At times it’s a bit too popular-science-y for me, and I think the level of coverage is a little bit lower than that of some of the other physics books in the ‘A Very Brief Introduction‘ series by Oxford University Press, but on the other hand it did teach me some new things and explained some other things I knew about but did not fully understand before and I’m well aware that it can be really hard to strike the right balance when writing books like these. I don’t like it when authors employ analogies instead of equations to explain stuff, but on the other hand I’ve seen some of the relevant equations before, e.g. in the context of IAS lectures, so I was okay with skipping some of the math because I know how the math here can really blow up in your face fast – and it’s not like this book has no math or equations, but I think it’s the kind of math most people should be able to deal with. It’s a decent introduction to the topic, and I must admit I have yet really to be significantly disappointed in a book from the physics part of this OUP series – they’re good books, readable and interesting.

Below I have added some quotes and observations from the book, as well as some relevant links to material or people covered in the book. Some of the links below I have also added previously when covering other books in the physics series, but I do not really care about that as I try to cover each book separately; the two main ideas behind adding links of this kind are: 1) to remind me which topics (…which I was unable to cover in detail in the post using quotes, because there’s too much stuff to cover in the book for that to make sense…) were covered in the book, and: 2) to give people who might be interested in reading the book an idea of which topics are covered therein; if I neglected to add relevant links simply because such topics were also covered in other books I’ve covered here, the link collection would not accomplish what I’d like it to accomplish. The link collection was gathered while I was reading the book (I was bookmarking relevant wiki articles along the way while reading the book), whereas the quotes included in the post were only added to the post after I had finished adding the links from the link collection; I am well aware that some topics covered in the quotes of the book are also covered in the link collection, but I didn’t care enough about this ‘double coverage of topics’ to remove those links that refer to material also covered in my quotes in this post from the link collection.

I think the part of the book coverage related to finding good quotes to include in this post was harder than it has been in the context of some of the other physics books I’ve covered recently, because the author goes into quite some detail explaining some specific dynamics of star evolution which are not easy to boil down to a short quote which is still meaningful to people who do not know the context. The fact that he does go into those details was of course part of the reason why I liked the book.

“[W]e cannot consider heat energy in isolation from the other large energy store that the Sun has – gravity. Clearly, gravity is an energy source, since if it were not for the resistance of gas pressure, it would make all the Sun’s gas move inwards at high speed. So heat and gravity are both potential sources of energy, and must be related by the need to keep the Sun in equilibrium. As the Sun tries to cool down, energy must be swapped between these two forms to keep the Sun in balance […] the heat energy inside the Sun is not enough to spread all of its contents out over space and destroy it as an identifiable object. The Sun is gravitationally bound – its heat energy is significant, but cannot supply enough energy to loosen gravity’s grip, and unbind the Sun. This means that when pressure balances gravity for any system (as in the Sun), the total heat energy T is always slightly less than that needed (V) to disperse it. In fact, it turns out to be exactly half of what would be needed for this dispersal, so that 2T + V = 0, or V = −2 T. The quantities T and V have opposite signs, because energy has to be supplied to overcome gravity, that is, you have to use T to try to cancel some of V. […] you need to supply energy to a star in order to overcome its gravity and disperse all of its gas to infinity. In line with this, the star’s total energy (thermal plus gravitational) is E = T + V = −T, that is, the total energy is minus its thermal energy, and so is itself negative. That is, a star is a gravitationally bound object. Whenever the system changes slowly enough that pressure always balances gravity, these two energies always have to be in this 1:2 ratio. […] This reasoning shows that cooling, shrinking, and heating up all go together, that is, as the Sun tries to cool down, its interior heats up. […] Because E = –T, when the star loses energy (by radiating), making its total energy E more negative, the thermal energy T gets more positive, that is, losing energy makes the star heat up. […] This result, that stars heat up when they try to cool, is central to understanding why stars evolve.”

“[T]he whole of chemistry is simply the science of electromagnetic interaction of atoms with each other. Specifically, chemistry is what happens when electrons stick atoms together to make molecules. The electrons doing the sticking are the outer ones, those furthest from the nucleus. The physical rules governing the arrangement of electrons around the nucleus mean that atoms divide into families characterized by their outer electron configurations. Since the outer electrons specify the chemical properties of the elements, these families have similar chemistry. This is the origin of the periodic table of the elements. In this sense, chemistry is just a specialized branch of physics. […] atoms can combine, or react, in many different ways. A chemical reaction means that the electrons sticking atoms together are rearranging themselves. When this happens, electromagnetic energy may be released, […] or an energy supply may be needed […] Just as we measured gravitational binding energy as the amount of energy needed to disperse a body against the force of its own gravity, molecules have electromagnetic binding energies measured by the energies of the orbiting electrons holding them together. […] changes of electronic binding only produce chemical energy yields, which are far too small to power stars. […] Converting hydrogen into helium is about 15 million times more effective than burning oil. This is because strong nuclear forces are so much more powerful than electromagnetic forces.”

“[T]here are two chains of reactions which can convert hydrogen to helium. The rate at which they occur is in both cases quite sensitive to the gas density, varying as its square, but extremely sensitive to the gas temperature […] If the temperature is below a certain threshold value, the total energy output from hydrogen burning is completely negligible. If the temperature rises only slightly above this threshold, the energy output becomes enormous. It becomes so enormous that the effect of all this energy hitting the gas in the star’s centre is life-threatening to it. […] energy is related to mass. So being hit by energy is like being hit by mass: luminous energy exerts a pressure. For a luminosity above a certain limiting value related to the star’s mass, the pressure will blow it apart. […] The central temperature of the Sun, and stars like it, must be almost precisely at the threshold value. It is this temperature sensitivity which fixes the Sun’s central temperature at the value of ten million degrees […] All stars burning hydrogen in their centres must have temperatures close to this value. […] central temperature [is] roughly proportional to the ratio of mass to radius [and this means that] the radius of a hydrogen-burning star is approximately proportional to its mass […] You might wonder how the star ‘knows’ that its radius is supposed to have this value. This is simple: if the radius is too large, the star’s central temperature is too low to produce any nuclear luminosity at all. […] the star will shrink in an attempt to provide the luminosity from its gravitational binding energy. But this shrinking is just what it needs to adjust the temperature in its centre to the right value to start hydrogen burning and produce exactly the right luminosity. Similarly, if the star’s radius is slightly too small, its nuclear luminosity will grow very rapidly. This increases the radiation pressure, and forces the star to expand, again back to the right radius and so the right luminosity. These simple arguments show that the star’s structure is self-adjusting, and therefore extremely stable […] The basis of this stability is the sensitivity of the nuclear luminosity to temperature and so radius, which controls it like a thermostat.”

“Hydrogen burning produces a dense and growing ball of helium at the star’s centre. […] the star has a weight problem to solve – the helium ball feels its own weight, and that of all the rest of the star as well. A similar effect led to the ignition of hydrogen in the first place […] we can see what happens as the core mass grows. Let’s imagine that the core mass has doubled. Then the core radius also doubles, and its volume grows by a factor 2 × 2 × 2 = 8. This is a bigger factor than the mass growth, so the density is 2/(2 × 2 × 2) = 1/4 of its original value. We end with the surprising result that as the helium core mass grows in time, its central number density drops. […] Because pressure is proportional to density, the central pressure of the core drops also […] Since the density of the hydrogen envelope does not change over time, […] the helium core becomes less and less able to cope with its weight problem as its mass increases. […] The end result is that once the helium core contains more than about 10% of the star’s mass, its pressure is too low to support the weight of the star, and things have to change drastically. […] massive stars have much shorter main-sequence lifetimes, decreasing like the inverse square of their masses […] A star near the minimum main-sequence mass of one-tenth of the Sun’s has an unimaginably long lifetime of almost 1013 years, nearly a thousand times the Sun’s. All low-mass stars are still in the first flush of youth. This is the fundamental fact of stellar life: massive stars have short lives, and low-mass stars live almost forever – certainly far longer than the current age of the Universe.”

“We have met all three […] timescales [see links below – US] for the Sun. The nuclear time is ten billion years, the thermal timescale is thirty million years, and the dynamical one […] just half an hour. […] Each timescale says how long the star takes to react to changes of the given type. The dynamical time tells us that if we mess up the hydrostatic balance between pressure and weight, the star will react by moving its mass around for a few dynamical times (in the Sun’s case, a few hours) and then settle down to a new state in which pressure and weight are in balance. And because this time is so short compared with the thermal time, the stellar material will not have lost or gained any significant amount of heat, but simply carried this around […] although the star quickly finds a new hydrostatic equilibrium, this will not correspond to thermal equilibrium, where heat moves smoothly outwards through the star at precisely the rate determined by the nuclear reactions deep in the centre. Instead, some bits of the star will be too cool to pass all this heat on outwards, and some will be too hot to absorb much of it. Over a thermal timescale (a few tens of millions of years in the Sun), the cool parts will absorb the extra heat they need from the stellar radiation field, and the hot parts rid themselves of the excess they have, until we again reach a new state of thermal equilibrium. Finally, the nuclear timescale tells us the time over which the star synthesizes new chemical elements, radiating the released energy into space.”

“[S]tars can end their lives in just one of three possible ways: white dwarf, neutron star, or black hole.”

“Stars live a long time, but must eventually die. Their stores of nuclear energy are finite, so they cannot shine forever. […] they are forced onwards through a succession of evolutionary states because the virial theorem connects gravity with thermodynamics and prevents them from cooling down. So main-sequence dwarfs inexorably become red giants, and then supergiants. What breaks this chain? Its crucial link is that the pressure supporting a star depends on how hot it is. This link would snap if the star was instead held up by a pressure which did not care about its heat content. Finally freed from the demand to stay hot to support itself, a star like this would slowly cool down and die. This would be an endpoint for stellar evolution. […] Electron degeneracy pressure does not depend on temperature, only density. […] one possible endpoint of stellar evolution arises when a star is so compressed that electron degeneracy is its main form of pressure. […] [Once] the star is a supergiant […] a lot of its mass is in a hugely extended envelope, several hundred times the Sun’s radius. Because of this vast size, the gravity tying the envelope to the core is very weak. […] Even quite small outward forces can easily overcome this feeble pull and liberate mass from the envelope, so a lot of the star’s mass is blown out into space. Eventually, almost the entire remaining envelope is ejected as a roughly spherical cloud of gas. The core quickly exhausts the thin shell of nuclear-burning material on its surface. Now gravity makes the core contract in on itself and become denser, increasing the electron degeneracy pressure further. The core ends as an extremely compact star, with a radius similar to the Earth’s, but a mass similar to the Sun, supported by this pressure. This is a white dwarf. […] Even though its surface is at least initially hot, its small surface means that it is faint. […] White dwarfs cannot start nuclear reactions, so eventually they must cool down and become dark, cold, dead objects. But before this happens, they still glow from the heat energy left over from their earlier evolution, slowly getting fainter. Astronomers observe many white dwarfs in the sky, suggesting that this is how a large fraction of all stars end their lives. […] Stars with an initial mass more than about seven times the Sun’s cannot end as white dwarfs.”

“In many ways, a neutron star is a vastly more compact version of a white dwarf, with the fundamental difference that its pressure arises from degenerate neutrons, not degenerate electrons. One can show that the ratio of the two stellar radii, with white dwarfs about one thousand times bigger than the 10 kilometres of a neutron star, is actually just the ratio of neutron to electron mass.”

“Most massive stars are not isolated, but part of a binary system […]. If one is a normal star, and the other a neutron star, and the binary is not very wide, there are ways for gas to fall from the normal star on to the neutron star. […] Accretion on to very compact objects like neutron stars almost always occurs through a disc, since the gas that falls in always has some rotation. […] a star’s luminosity cannot be bigger than the Eddington limit. At this limit, the pressure of the radiation balances the star’s gravity at its surface, so any more luminosity blows matter off the star. The same sort of limit must apply to accretion: if this tries to make too high a luminosity, radiation pressure will tend to blow away the rest of the gas that is trying to fall in, and so reduce the luminosity until it is below the limit. […] a neutron star is only 10 kilometres in radius, compared with the 700,000 kilometres of the Sun. This can only happen if this very small surface gets very hot. The surface of a healthily accreting neutron star reaches about 10 million degrees, compared with the 6,000 or so of the Sun. […] The radiation from such intensely hot surfaces comes out at much shorter wavelengths than the visible emission from the Sun – the surfaces of a neutron star and its accretion disc emit photons that are much more energetic than those of visible light. Accreting neutron stars and black holes make X-rays.”

“[S]tar formation […] is harder to understand than any other part of stellar evolution. So we use our knowledge of the later stages of stellar evolution to help us understand star formation. Working backwards in this way is a very common procedure in astronomy […] We know much less about how stars form than we do about any later part of their evolution. […] The cyclic nature of star formation, with stars being born from matter chemically enriched by earlier generations, and expelling still more processed material into space as they die, defines a cosmic epoch – the epoch of stars. The end of this epoch will arrive only when the stars have turned all the normal matter of the Universe into iron, and left it locked in dead remnants such as black holes.”

Stellar evolution.
Gustav Kirchhoff.
Robert Bunsen.
Joseph von Fraunhofer.
Spectrograph.
Absorption spectroscopy.
Emission spectrum.
Doppler effect.
Parallax.
Stellar luminosity.
Cecilia Payne-Gaposchkin.
Ejnar Hertzsprung/Henry Norris Russell/Hertzsprung–Russell diagram.
Red giant.
White dwarf (featured article).
Main sequence (featured article).
Gravity/Electrostatics/Strong nuclear force.
Pressure/Boyle’s law/Charles’s law.
Hermann von Helmholtz.
William Thomson (Kelvin).
Gravitational binding energy.
Thermal energy/Gravitational energy.
Virial theorem.
Kelvin-Helmholtz time scale.
Chemical energy/Bond-dissociation energy.
Nuclear binding energy.
Nuclear fusion.
Heisenberg’s uncertainty principle.
Quantum tunnelling.
Pauli exclusion principle.
Eddington limit.
Convection.
Electron degeneracy pressure.
Nuclear timescale.
Number density.
Dynamical timescale/free-fall time.
Hydrostatic equilibrium/Thermal equilibrium.
Core collapse.
Hertzsprung gap.
Supergiant star.
Chandrasekhar limit.
Core-collapse supernova (‘good article’).
Crab Nebula.
Stellar nucleosynthesis.
Neutron star.
Schwarzschild radius.
Black hole (‘good article’).
Roy Kerr.
Pulsar.
Jocelyn Bell.
Anthony Hewish.
Accretion/Accretion disk.
X-ray binary.
Binary star evolution.
SS 433.
Gamma ray burst.
Hubble’s law/Hubble time.
Cosmic distance ladder/Standard candle/Cepheid variable.
Star formation.
Pillars of Creation.
Jeans instability.
Initial mass function.

July 2, 2017 Posted by | Astronomy, Books, Chemistry, Physics | Leave a comment

The Personality Puzzle (I)

I don’t really like this book, which is a personality psychology introductory textbook by David Funder. I’ve read the first 400 pages (out of 700), but I’m still debating whether or not to finish it, it just isn’t very good; the level of coverage is low, it’s very fluffy and the signal-to-noise ratio is nowhere near where I’d like it to be when I’m reading academic texts. Some parts of it frankly reads like popular science. However despite not feeling that the book is all that great I can’t justify not blogging it; stuff I don’t blog I tend to forget, and if I’m reading a mediocre textbook anyway I should at least try to pick out some of the decent stuff in there which keeps me reading and try to make it easier for me to recall that stuff later. Some parts of- and arguments/observations included in the book are in my opinion just plain silly or stupid, but I won’t go into these things in this post because I don’t really see what would be the point of doing that.

The main reason why I decided to give the book a go was that I liked Funder’s book Personality Judgment, which I read a few years ago and which deals with some topics also covered superficially in this text – it’s a much better book, in my opinion, at least as far as I can remember (…I have actually been starting to wonder if it was really all that great, if it was written by the same guy who wrote this book…), if you’re interested in these matters. If you’re interested in a more ‘pure’ personality psychology text, a significantly better alternative is Leary et al.‘s Handbook of Individual Differences in Social Behavior. Because of the multi-author format it also includes some very poor chapters, but those tend to be somewhat easy to identify and skip to get to the good stuff if you’re so inclined, and the general coverage is at a much higher level than that of this book.

Below I have added some quotes and observations from the first 150 pages of the book.

“A theory that accounts for certain things extremely well will probably not explain everything else so well. And a theory that tries to explain almost everything […] would probably not provide the best explanation for any one thing. […] different [personality psychology] basic approaches address different sets of questions […] each basic approach usually just ignores the topics it is not good at explaining.”

Personality psychology tends to emphasize how individuals are different from one another. […] Other areas of psychology, by contrast, are more likely to treat people as if they were the same or nearly the same. Not only do the experimental subfields of psychology, such as cognitive and social psychology, tend to ignore how people are different from each other, but also the statistical analyses central to their research literally put individual differences into their “error” terms […] Although the emphasis of personality psychology often entails categorizing and labeling people, it also leads the field to be extraordinarily sensitive — more than any other area of psychology — to the fact that people really are different.”

“If you want to “look at” personality, what do you look at, exactly? Four different things. First, and perhaps most obviously, you can have the person describe herself. Personality psychologists often do exactly this. Second, you can ask people who know the person to describe her. Third, you can check on how the person is faring in life. And finally, you can observe what the person does and try to measure her behavior as directly and objectively as possible. These four types of clues can be called S [self-judgments], I [informants], L [life], and B [behavior] data […] The point of the four-way classification […] is not to place every kind of data neatly into one and only one category. Rather, the point is to illustrate the types of data that are relevant to personality and to show how they all have both advantages and disadvantages.”

“For cost-effectiveness, S data simply cannot be beat. […] According to one analysis, 70 percent of the articles in an important personality journal were based on self-report (Vazire, 2006).”

“I data are judgments by knowledgeable “informants” about general attributes of the individual’s personality. […] Usually, close acquaintanceship paired with common sense is enough to allow people to make judgments of each other’s attributes with impressive accuracy […]. Indeed, they may be more accurate than self-judgments, especially when the judgments concern traits that are extremely desirable or extremely undesirable […]. Only when the judgments are of a technical nature (e.g., the diagnosis of a mental disorder) does psychological education become relevant. Even then, acquaintances without professional training are typically well aware when someone has psychological problems […] psychologists often base their conclusions on contrived tests of one kind or another, or on observations in carefully constructed and controlled environments. Because I data derive from behaviors informants have seen in daily social interactions, they enjoy an extra chance of being relevant to aspects of personality that affect important life outcomes. […] I data reflect the opinions of people who interact with the person every day; they are the person’s reputation. […] personality judgments can [however] be [both] unfair as well as mistaken […] The most common problem that arises from letting people choose their own informants — the usual practice in research — may be the “letter of recommendation effect” […] research participants may tend to nominate informants who think well of them, leading to I data that provide a more positive picture than might have been obtained from more neutral parties.”

“L data […] are verifable, concrete, real-life facts that may hold psychological significance. […] An advantage of using archival records is that they are not prone to the potential biases of self-report or the judgments of others. […] [However] L data have many causes, so trying to establish direct connections between specific attributes of personality and life outcomes is chancy. […] a psychologist can predict a particular outcome from psychological data only to the degree that the outcome is psychologically caused. L data often are psychologically caused only to a small degree.”

“The idea of B data is that participants are found, or put, in some sort of a situation, sometimes referred to as a testing situation, and then their behavior is directly observed. […] B data are expensive [and] are not used very often compared to the other types. Relatively few psychologists have the necessary resources.”

“Reliable data […] are measurements that reflect what you are trying to assess and are not affected by anything else. […] When trying to measure a stable attribute of personality—a trait rather than a state — the question of reliability reduces to this: Can you get the same result more than once? […] Validity is the degree to which a measurement actually reflects what one thinks or hopes it does. […] for a measure to be valid, it must be reliable. But a reliable measure is not necessarily valid. […] A measure that is reliable gives the same answer time after time. […] But even if a measure is the same time after time, that does not necessarily mean it is correct.”

“[M]ost personality tests provide S data. […] Other personality tests yield B data. […] IQ tests […] yield B data. Imagine trying to assess intelligence using an S-data test, asking questions such as “Are you an intelligent person?” and “Are you good at math?” Researchers have actually tried this, but simply asking people whether they are smart turns out to be a poor way to measure intelligence”.

“The answer an individual gives to any one question might not be particularly informative […] a single answer will tend to be unreliable. But if a group of similar questions is asked, the average of the answers ought to be much more stable, or reliable, because random fluctuations tend to cancel each other out. For this reason, one way to make a personality test more reliable is simply to make it longer.”

“The factor analytic method of test construction is based on a statistical technique. Factor analysis identifies groups of things […] that seem to have something in common. […] To use factor analysis to construct a personality test, researchers begin with a long list of […] items […] The next step is to administer these items to a large number of participants. […] The analysis is based on calculating correlation coefficients between each item and every other item. Many items […] will not correlate highly with anything and can be dropped. But the items that do correlate with each other can be assembled into groups. […] The next steps are to consider what the items have in common, and then name the factor. […] Factor analysis has been used not only to construct tests, but also to decide how many fundamental traits exist […] Various analysts have come up with different answers.”

[The Big Five were derived from factor analyses.]

The empirical strategy of test construction is an attempt to allow reality to speak for itself. […] Like the factor analytic approach described earlier, the frst step of the empirical approach is to gather lots of items. […] The second step, however, is quite different. For this step, you need to have a sample of participants who have already independently been divided into the groups you are interested in. Occupational groups and diagnostic categories are often used for this purpose. […] Then you are ready for the third step: administering your test to your participants. The fourth step is to compare the answers given by the different groups of participants. […] The basic assumption of the empirical approach […] is that certain kinds of people answer certain questions on personality inventories in distinctive ways. If you answer questions the same way as members of some occupational or diagnostic group did in the original derivation study, then you might belong to that group too. […] responses to empirically derived tests are difficult to fake. With a personality test of the straightforward, S-data variety, you can describe yourself the way you want to be seen, and that is indeed the score you will get. But because the items on empirically derived scales sometimes seem backward or absurd, it is difficult to know how to answer in such a way as to guarantee the score you want. This is often held up as one of the great advantages of the empirical approach […] [However] empirically derived tests are only as good as the criteria by which they are developed or against which they are cross-validated. […] the empirical correlates of item responses by which these tests are assembled are those found in one place, at one time, with one group of participants. If no attention is paid to item content, then there is no way to be confident that the test will work in a similar manner at another time, in another place, with different participants. […] A particular concern is that the empirical correlates of item response might change over time. The MMPI was developed decades ago and has undergone a major revision only once”.

“It is not correct, for example, that the significance level provides the probability that the substantive (non-null) hypothesis is true. […] the significance level gives the probability of getting the result one found if the null hypothesis were true. One statistical writer offered the following analogy (Dienes, 2011): The probability that a person is dead, given that a shark has bitten his head off, is 1.0. However, the probability that a person’s head was bitten off by a shark, given that he is dead, is much lower. The probability of the data given the hypothesis, and of the hypothesis given the data, is not the same thing. And the latter is what we really want to know. […] An effect size is more meaningful than a significance level. […] It is both facile and misleading to use the frequently taught method of squaring correlations if the intention is to evaluate effect size.”

June 30, 2017 Posted by | Books, Psychology, Statistics | Leave a comment