Econstudentlog

A few diabetes papers of interest

i. Islet Long Noncoding RNAs: A Playbook for Discovery and Characterization.

“This review will 1) highlight what is known about lncRNAs in the context of diabetes, 2) summarize the strategies used in lncRNA discovery pipelines, and 3) discuss future directions and the potential impact of studying the role of lncRNAs in diabetes.”

“Decades of mouse research and advances in genome-wide association studies have identified several genetic drivers of monogenic syndromes of β-cell dysfunction, as well as 113 distinct type 2 diabetes (T2D) susceptibility loci (1) and ∼60 loci associated with an increased risk of developing type 1 diabetes (T1D) (2). Interestingly, these studies discovered that most T1D and T2D susceptibility loci fall outside of coding regions, which suggests a role for noncoding elements in the development of disease (3,4). Several studies have demonstrated that many causal variants of diabetes are significantly enriched in regions containing islet enhancers, promoters, and transcription factor binding sites (5,6); however, not all diabetes susceptibility loci can be explained by associations with these regulatory regions. […] Advances in RNA sequencing (RNA-seq) technologies have revealed that mammalian genomes encode tens of thousands of RNA transcripts that have similar features to mRNAs, yet are not translated into proteins (7). […] detailed characterization of many of these transcripts has challenged the idea that the central role for RNA in a cell is to give rise to proteins. Instead, these RNA transcripts make up a class of molecules called noncoding RNAs (ncRNAs) that function either as “housekeeping” ncRNAs, such as transfer RNAs (tRNAs) and ribosomal RNAs (rRNAs), that are expressed ubiquitously and are required for protein synthesis or as “regulatory” ncRNAs that control gene expression. While the functional mechanisms of short regulatory ncRNAs, such as microRNAs (miRNAs), small interfering RNAs (siRNAs), and Piwi-interacting RNAs (piRNAs), have been described in detail (810), the most abundant and functionally enigmatic regulatory ncRNAs are called long noncoding RNAs (lncRNAs) that are loosely defined as RNAs larger than 200 nucleotides (nt) that do not encode for protein (1113). Although using a definition based strictly on size is somewhat arbitrary, this definition is useful both bioinformatically […] and technically […]. While the 200-nt size cutoff has simplified identification of lncRNAs, this rather broad classification means several features of lncRNAs, including abundance, cellular localization, stability, conservation, and function, are inherently heterogeneous (1517). Although this represents one of the major challenges of lncRNA biology, it also highlights the untapped potential of lncRNAs to provide a novel layer of gene regulation that influences islet physiology and pathophysiology.”

“Although the role of miRNAs in diabetes has been well established (9), analyses of lncRNAs in islets have lagged behind their short ncRNA counterparts. However, several recent studies provide evidence that lncRNAs are crucial components of the islet regulome and may have a role in diabetes (27). […] misexpression of several lncRNAs has been correlated with diabetes complications, such as diabetic nephropathy and retinopathy (2931). There are also preliminary studies suggesting that circulating lncRNAs, such as Gas5, MIAT1, and SENCR, may represent effective molecular biomarkers of diabetes and diabetes-related complications (32,33). Finally, several recent studies have explored the role of lncRNAs in the peripheral metabolic tissues that contribute to energy homeostasis […]. In addition to their potential as genetic drivers and/or biomarkers of diabetes and diabetes complications, lncRNAs can be exploited for the treatment of diabetes. For example, although tremendous efforts have been dedicated to generating replacement β-cells for individuals with diabetes (35,36), human pluripotent stem cell–based β-cell differentiation protocols remain inefficient, and the end product is still functionally and transcriptionally immature compared with primary human β-cells […]. This is largely due to our incomplete knowledge of in vivo differentiation regulatory pathways, which likely include a role for lncRNAs. […] Inherent characteristics of lncRNAs have also made them attractive candidates for drug targeting, which could be exploited for developing new diabetes therapies.”

“With the advancement of high-throughput sequencing techniques, the list of islet-specific lncRNAs is growing exponentially; however, functional characterization is missing for the majority of these lncRNAs. […] Tens of thousands of lncRNAs have been identified in different cell types and model organisms; however, their functions largely remain unknown. Although the tools for determining lncRNA function are technically restrictive, uncovering novel regulatory mechanisms will have the greatest impact on understanding islet function and identifying novel therapeutics for diabetes. To date, no biochemical assay has been used to directly determine the molecular mechanisms by which islet lncRNAs function, which highlights both the infancy of the field and the difficulty in implementing these techniques. […] Due to the infancy of the lncRNA field, most of the biochemical and genetic tools used to interrogate lncRNA function have only recently been developed or are adapted from techniques used to study protein-coding genes and we are only beginning to appreciate the limits and challenges of borrowing strategies from the protein-coding world.”

“The discovery of lncRNAs as a novel class of tissue-specific regulatory molecules has spawned an exciting new field of biology that will significantly impact our understanding of pancreas physiology and pathophysiology. As the field continues to grow, there is growing appreciation that lncRNAs will provide many of the missing components to existing molecular pathways that regulate islet biology and contribute to diabetes when they become dysfunctional. However, to date, most of the experimental emphasis on lncRNAs has focused on large-scale discovery using genome-wide approaches, and there remains a paucity of functional analysis.”

ii. Diabetes and Trajectories of Estimated Glomerular Filtration Rate: A Prospective Cohort Analysis of the Atherosclerosis Risk in Communities Study.

“Diabetes is among the strongest common risk factors for end-stage renal disease, and in industrialized countries, diabetes contributes to ∼50% of cases (3). Less is known about the pattern of kidney function decline associated with diabetes that precedes end-stage renal disease. Identifying patterns of estimated glomerular filtration rate (eGFR) decline could inform monitoring practices for people at high risk of chronic kidney disease (CKD) progression. A better understanding of when and in whom eGFR decline occurs would be useful for the design of clinical trials because eGFR decline >30% is now often used as a surrogate end point for CKD progression (4). Trajectories among persons with diabetes are of particular interest because of the possibility for early intervention and the prevention of CKD development. However, eGFR trajectories among persons with new diabetes may be complex due to the hypothesized period of hyperfiltration by which GFR increases, followed by progressive, rapid decline (5). Using data from the Atherosclerosis Risk in Communities (ARIC) study, an ongoing prospective community-based cohort of >15,000 participants initiated in 1987 with serial measurements of creatinine over 26 years, our aim was to characterize patterns of eGFR decline associated with diabetes, identify demographic, genetic, and modifiable risk factors within the population with diabetes that were associated with steeper eGFR decline, and assess for evidence of early hyperfiltration.”

“We categorized people into groups of no diabetes, undiagnosed diabetes, and diagnosed diabetes at baseline (visit 1) and compared baseline clinical characteristics using ANOVA for continuous variables and Pearson χ2 tests for categorical variables. […] To estimate individual eGFR slopes over time, we used linear mixed-effects models with random intercepts and random slopes. These models were fit on diabetes status at baseline as a nominal variable to adjust the baseline level of eGFR and included an interaction term between diabetes status at baseline and time to estimate annual decline in eGFR by diabetes categories. Linear mixed models were run unadjusted and adjusted, with the latter model including the following diabetes and kidney disease–related risk factors: age, sex, race–center, BMI, systolic blood pressure, hypertension medication use, HDL, prevalent coronary heart disease, annual family income, education status, and smoking status, as well as each variable interacted with time. Continuous covariates were centered at the analytic population mean. We tested model assumptions and considered different covariance structures, comparing nested models using Akaike information criteria. We identified the unstructured covariance model as the most optimal and conservative approach. From the mixed models, we described the overall mean annual decline by diabetes status at baseline and used the random effects to estimate best linear unbiased predictions to describe the distributions of yearly slopes in eGFR by diabetes status at baseline and displayed them using kernel density plots.”

“Because of substantial variation in annual eGFR slope among people with diagnosed diabetes, we sought to identify risk factors that were associated with faster decline. Among those with diagnosed diabetes, we compared unadjusted and adjusted mean annual decline in eGFR by race–APOL1 risk status (white, black– APOL1 low risk, and black–APOL1 high risk) [here’s a relevant link, US], systolic blood pressure […], smoking status […], prevalent coronary heart disease […], diabetes medication use […], HbA1c […], and 1,5-anhydroglucitol (≥10 and <10 μg/mL) [relevant link, US]. Because some of these variables were only available at visit 2, we required that participants included in this subgroup analysis attend both visits 1 and 2 and not be missing information on APOL1 or the variables assessed at visit 2 to ensure a consistent sample size. In addition to diabetes and kidney disease–related risk factors in the adjusted model, we also included diabetes medication use and HbA1c to account for diabetes severity in these analyses. […] to explore potential hyperfiltration, we used a linear spline model to allow the slope to change for each diabetes category between the first 3 years of follow-up (visit 1 to visit 2) and the subsequent time period (visit 2 to visit 5).”

“There were 15,517 participants included in the analysis: 13,698 (88%) without diabetes, 634 (4%) with undiagnosed diabetes, and 1,185 (8%) with diagnosed diabetes at baseline. […] At baseline, participants with undiagnosed and diagnosed diabetes were older, more likely to be black or have hypertension and coronary heart disease, and had higher mean BMI and lower mean HDL compared with those without diabetes […]. Income and education levels were also lower among those with undiagnosed and diagnosed diabetes compared with those without diabetes. […] Overall, there was a nearly linear association between eGFR and age over time, regardless of diabetes status […]. The crude mean annual decline in eGFR was slowest among those without diabetes at baseline (decline of −1.6 mL/min/1.73 m2/year [95% CI −1.6 to −1.5]), faster among those with undiagnosed diabetes compared with those without diabetes (decline of −2.1 mL/min/1.73 m2/year [95% CI −2.2 to −2.0][…]), and nearly twice as rapid among those with diagnosed diabetes compared with those without diabetes (decline of −2.9 mL/min/1.73 m2/year [95% CI −3.0 to −2.8][…]). Adjustment for diabetes and kidney disease–related risk factors attenuated the results slightly, but those with undiagnosed and diagnosed diabetes still had statistically significantly steeper declines than those without diabetes (decline among no diabetes −1.4 mL/min/1.73 m2/year [95% CI −1.5 to −1.4] and decline among undiagnosed diabetes −1.8 mL/min/1.73 m2/year [95% CI −2.0 to −1.7], difference vs. no diabetes of −0.4 mL/min/1.73 m2/year [95% CI −0.5 to −0.3; P < 0.001]; decline among diagnosed diabetes −2.5 mL/min/1.73 m2/year [95% CI −2.6 to −2.4], difference vs. no diabetes of −1.1 mL/min/1.73 m2/ year [95% CI −1.2 to −1.0; P < 0.001]). […] The decline in eGFR per year varied greatly across individuals, particularly among those with diabetes at baseline […] Among participants with diagnosed diabetes at baseline, those who were black, had systolic blood pressure ≥140 mmHg, used diabetes medications, had an HbA1c ≥7% [≥53 mmol/mol], or had 1,5-anhydroglucitol <10 μg/mL were at risk for steeper annual declines than their counterparts […]. Smoking status and prevalent coronary heart disease were not associated with significantly steeper eGFR decline in unadjusted analyses. Adjustment for risk factors, diabetes medication use, and HbA1c attenuated the differences in decline for all subgroups with the exception of smoking status, leaving black race along with APOL1-susceptible genotype, systolic blood pressure ≥140 mmHg, current smoking, insulin use, and HbA1c ≥9% [≥75 mmol/mol] as the risk factors indicative of steeper decline.”

CONCLUSIONS Diabetes is an important risk factor for kidney function decline. Those with diagnosed diabetes declined almost twice as rapidly as those without diabetes. Among people with diagnosed diabetes, steeper declines were seen in those with modifiable risk factors, including hypertension and glycemic control, suggesting areas for continued targeting in kidney disease prevention. […] Few other community-based studies have evaluated differences in kidney function decline by diabetes status over a long period through mid- and late life. One study of 10,184 Canadians aged ≥66 years with creatinine measured during outpatient visits showed results largely consistent with our findings but with much shorter follow-up (median of 2 years) (19). Other studies of eGFR change in a general population have found smaller declines than our results (20,21). A study conducted in Japanese participants aged 40–79 years found a decline of only −0.4 mL/min/1.73 m2/year over the course of two assessments 10 years apart (compared with our estimate among those without diabetes: −1.6 mL/min/1.73 m2/year). This is particularly interesting, as Japan is known to have a higher prevalence of CKD and end-stage renal disease than the U.S. (20). However, this study evaluated participants over a shorter time frame and required attendance at both assessments, which may have decreased the likelihood of capturing severe cases and resulted in underestimation of decline.”

“The Baltimore Longitudinal Study of Aging also assessed kidney function over time in a general population of 446 men, ranging in age from 22 to 97 years at baseline, each with up to 14 measurements of creatinine clearance assessed between 1958 and 1981 (21). They also found a smaller decline than we did (−0.8 mL/min/year), although this study also had notable differences. Their main analysis excluded participants with hypertension and history of renal disease or urinary tract infection and those treated with diuretics and/or antihypertensive medications. Without those exclusions, their overall estimate was −1.1 mL/min/year, which better reflects a community-based population and our results. […] In our evaluation of risk factors that might explain the variation in decline seen among those with diagnosed diabetes, we observed that black race, systolic blood pressure ≥140 mmHg, insulin use, and HbA1c ≥9% (≥75 mmol/mol) were particularly important. Although the APOL1 high-risk genotype is a known risk factor for eGFR decline, African Americans with low-risk APOL1 status continued to be at higher risk than whites even after adjustment for traditional risk factors, diabetes medication use, and HbA1c.”

“Our results are relevant to the design and conduct of clinical trials. Hard clinical outcomes like end-stage renal disease are relatively rare, and a 30–40% decline in eGFR is now accepted as a surrogate end point for CKD progression (4). We provide data on patient subgroups that may experience accelerated trajectories of kidney function decline, which has implications for estimating sample size and ensuring adequate power in future clinical trials. Our results also suggest that end points of eGFR decline might not be appropriate for patients with new-onset diabetes, in whom declines may actually be slower than among persons without diabetes. Slower eGFR decline among those with undiagnosed diabetes, who are likely early in the course of diabetes, is consistent with the hypothesis of hyperfiltration. Similar to other studies, we found that persons with undiagnosed diabetes had higher GFR at the outset, but this was a transient phenomenon, as they ultimately experienced larger declines in kidney function than those without diabetes over the course of follow-up (2325). Whether hyperfiltration is a universal aspect of early disease and, if not, whether it portends worse long-term outcomes is uncertain. Existing studies investigating hyperfiltration as a precursor to adverse kidney outcomes are inconsistent (24,26,27) and often confounded by diabetes severity factors like duration (27). We extended this literature by separating undiagnosed and diagnosed diabetes to help address that confounding.”

iii. Saturated Fat Is More Metabolically Harmful for the Human Liver Than Unsaturated Fat or Simple Sugars.

OBJECTIVE Nonalcoholic fatty liver disease (i.e., increased intrahepatic triglyceride [IHTG] content), predisposes to type 2 diabetes and cardiovascular disease. Adipose tissue lipolysis and hepatic de novo lipogenesis (DNL) are the main pathways contributing to IHTG. We hypothesized that dietary macronutrient composition influences the pathways, mediators, and magnitude of weight gain-induced changes in IHTG.

RESEARCH DESIGN AND METHODS We overfed 38 overweight subjects (age 48 ± 2 years, BMI 31 ± 1 kg/m2, liver fat 4.7 ± 0.9%) 1,000 extra kcal/day of saturated (SAT) or unsaturated (UNSAT) fat or simple sugars (CARB) for 3 weeks. We measured IHTG (1H-MRS), pathways contributing to IHTG (lipolysis ([2H5]glycerol) and DNL (2H2O) basally and during euglycemic hyperinsulinemia), insulin resistance, endotoxemia, plasma ceramides, and adipose tissue gene expression at 0 and 3 weeks.

RESULTS Overfeeding SAT increased IHTG more (+55%) than UNSAT (+15%, P < 0.05). CARB increased IHTG (+33%) by stimulating DNL (+98%). SAT significantly increased while UNSAT decreased lipolysis. SAT induced insulin resistance and endotoxemia and significantly increased multiple plasma ceramides. The diets had distinct effects on adipose tissue gene expression.”

CONCLUSIONS NAFLD has been shown to predict type 2 diabetes and cardiovascular disease in multiple studies, even independent of obesity (1), and also to increase the risk of progressive liver disease (17). It is therefore interesting to compare effects of different diets on liver fat content and understand the underlying mechanisms. We examined whether provision of excess calories as saturated (SAT) or unsaturated (UNSAT) fats or simple sugars (CARB) influences the metabolic response to overfeeding in overweight subjects. All overfeeding diets increased IHTGs. The SAT diet induced a greater increase in IHTGs than the UNSAT diet. The composition of the diet altered sources of excess IHTGs. The SAT diet increased lipolysis, whereas the CARB diet stimulated DNL. The SAT but not the other diets increased multiple plasma ceramides, which increase the risk of cardiovascular disease independent of LDL cholesterol (18). […] Consistent with current dietary recommendations (3638), the current study shows that saturated fat is the most harmful dietary constituent regarding IHTG accumulation.”

iv. Primum Non Nocere: Refocusing Our Attention on Severe Hypoglycemia Prevention.

“Severe hypoglycemia, defined as low blood glucose requiring assistance for recovery, is arguably the most dangerous complication of type 1 diabetes as it can result in permanent cognitive impairment, seizure, coma, accidents, and death (1,2). Since the Diabetes Control and Complications Trial (DCCT) demonstrated that intensive intervention to normalize glucose prevents long-term complications but at the price of a threefold increase in the rate of severe hypoglycemia (3), hypoglycemia has been recognized as the major limitation to achieving tight glycemic control. Severe hypoglycemia remains prevalent among adults with type 1 diabetes, ranging from ∼1.4% per year in the DCCT/EDIC (Epidemiology of Diabetes Interventions and Complications) follow-up cohort (4) to ∼8% in the T1D Exchange clinic registry (5).

One the greatest risk factors for severe hypoglycemia is impaired awareness of hypoglycemia (6), which increases risk up to sixfold (7,8). Hypoglycemia unawareness results from deficient counterregulation (9), where falling glucose fails to activate the autonomic nervous system to produce neuroglycopenic symptoms that normally help patients identify and respond to episodes (i.e., sweating, palpitations, hunger) (2). An estimated 20–25% of adults with type 1 diabetes have impaired hypoglycemia awareness (8), which increases to more than 50% after 25 years of disease duration (10).

Screening for hypoglycemia unawareness to identify patients at increased risk of severe hypoglycemic events should be part of routine diabetes care. Self-identified impairment in awareness tends to agree with clinical evaluation (11). Therefore, hypoglycemia unawareness can be easily and effectively screened […] Interventions for hypoglycemia unawareness include a range of behavioral and medical options. Avoiding hypoglycemia for at least several weeks may partially reverse hypoglycemia unawareness and reduce risk of future episodes (1). Therefore, patients with hypoglycemia and unawareness may be advised to raise their glycemic and HbA1c targets (1,2). Diabetes technology can play a role, including continuous subcutaneous insulin infusion (CSII) to optimize insulin delivery, continuous glucose monitoring (CGM) to give technological awareness in the absence of symptoms (14), or the combination of the two […] Aside from medical management, structured or hypoglycemia-specific education programs that aim to prevent hypoglycemia are recommended for all patients with severe hypoglycemia or hypoglycemia unawareness (14). In randomized trials, psychoeducational programs that incorporate increased education, identification of personal risk factors, and behavior change support have improved hypoglycemia unawareness and reduced the incidence of both nonsevere and severe hypoglycemia over short periods of follow-up (17,18) and extending up to 1 year (19).”

“Given that the presence of hypoglycemia unawareness increases the risk of severe hypoglycemia, which is the strongest predictor of a future episode (2,4), the implication that intervention can break the life-threatening and traumatizing cycle of hypoglycemia unawareness and severe hypoglycemia cannot be overstated. […] new evidence of durability of effect across treatment regimen without increasing the risk for long-term complications creates an imperative for action. In combination with existing screening tools and a body of literature investigating novel interventions for hypoglycemia unawareness, these results make the approach of screening, recognition, and intervention very compelling as not only a best practice but something that should be incorporated in universal guidelines on diabetes care, particularly for individuals with type 1 diabetes […] Hyperglycemia is […] only part of the puzzle in diabetes management. Long-term complications are decreasing across the population with improved interventions and their implementation (24). […] it is essential to shift our historical obsession with hyperglycemia and its long-term complications to equally emphasize the disabling, distressing, and potentially fatal near-term complication of our treatments, namely severe hypoglycemia. […] The health care providers’ first dictum is primum non nocere — above all, do no harm. ADA must refocus our attention on severe hypoglycemia as an iatrogenic and preventable complication of our interventions.”

v. Anti‐vascular endothelial growth factor combined with intravitreal steroids for diabetic macular oedema.

“Background

The combination of steroid and anti‐vascular endothelial growth factor (VEGF) intravitreal therapeutic agents could potentially have synergistic effects for treating diabetic macular oedema (DMO). On the one hand, if combined treatment is more effective than monotherapy, there would be significant implications for improving patient outcomes. Conversely, if there is no added benefit of combination therapy, then people could be potentially exposed to unnecessary local or systemic side effects.

Objectives

To assess the effects of intravitreal agents that block vascular endothelial growth factor activity (anti‐VEGF agents) plus intravitreal steroids versus monotherapy with macular laser, intravitreal steroids or intravitreal anti‐VEGF agents for managing DMO.”

“There were eight RCTs (703 participants, 817 eyes) that met our inclusion criteria with only three studies reporting outcomes at one year. The studies took place in Iran (3), USA (2), Brazil (1), Czech Republic (1) and South Korea (1). […] When comparing anti‐VEGF/steroid with anti‐VEGF monotherapy as primary therapy for DMO, we found no meaningful clinical difference in change in BCVA [best corrected visual acuity] […] or change in CMT [central macular thickness] […] at one year. […] There was very low‐certainty evidence on intraocular inflammation from 8 studies, with one event in the anti‐VEGF/steroid group (313 eyes) and two events in the anti‐VEGF group (322 eyes). There was a greater risk of raised IOP (Peto odds ratio (OR) 8.13, 95% CI 4.67 to 14.16; 635 eyes; 8 RCTs; moderate‐certainty evidence) and development of cataract (Peto OR 7.49, 95% CI 2.87 to 19.60; 635 eyes; 8 RCTs; moderate‐certainty evidence) in eyes receiving anti‐VEGF/steroid compared with anti‐VEGF monotherapy. There was low‐certainty evidence from one study of an increased risk of systemic adverse events in the anti‐VEGF/steroid group compared with the anti‐VEGF alone group (Peto OR 1.32, 95% CI 0.61 to 2.86; 103 eyes).”

“One study compared anti‐VEGF/steroid versus macular laser therapy. At one year investigators did not report a meaningful difference between the groups in change in BCVA […] or change in CMT […]. There was very low‐certainty evidence suggesting an increased risk of cataract in the anti‐VEGF/steroid group compared with the macular laser group (Peto OR 4.58, 95% 0.99 to 21.10, 100 eyes) and an increased risk of elevated IOP in the anti‐VEGF/steroid group compared with the macular laser group (Peto OR 9.49, 95% CI 2.86 to 31.51; 100 eyes).”

“Authors’ conclusions

Combination of intravitreal anti‐VEGF plus intravitreal steroids does not appear to offer additional visual benefit compared with monotherapy for DMO; at present the evidence for this is of low‐certainty. There was an increased rate of cataract development and raised intraocular pressure in eyes treated with anti‐VEGF plus steroid versus anti‐VEGF alone. Patients were exposed to potential side effects of both these agents without reported additional benefit.”

vi. Association between diabetic foot ulcer and diabetic retinopathy.

“More than 25 million people in the United States are estimated to have diabetes mellitus (DM), and 15–25% will develop a diabetic foot ulcer (DFU) during their lifetime [1]. DFU is one of the most serious and disabling complications of DM, resulting in significantly elevated morbidity and mortality. Vascular insufficiency and associated neuropathy are important predisposing factors for DFU, and DFU is the most common cause of non-traumatic foot amputation worldwide. Up to 70% of all lower leg amputations are performed on patients with DM, and up to 85% of all amputations are preceded by a DFU [2, 3]. Every year, approximately 2–3% of all diabetic patients develop a foot ulcer, and many require prolonged hospitalization for the treatment of ensuing complications such as infection and gangrene [4, 5].

Meanwhile, a number of studies have noted that diabetic retinopathy (DR) is associated with diabetic neuropathy and microvascular complications [610]. Despite the magnitude of the impact of DFUs and their consequences, little research has been performed to investigate the characteristics of patients with a DFU and DR. […] the aim of this study was to investigate the prevalence of DR in patients with a DFU and to elucidate the potential association between DR and DFUs.”

“A retrospective review was conducted on DFU patients who underwent ophthalmic and vascular examinations within 6 months; 100 type 2 diabetic patients with DFU were included. The medical records of 2496 type 2 diabetic patients without DFU served as control data. DR prevalence and severity were assessed in DFU patients. DFU patients were compared with the control group regarding each clinical variable. Additionally, DFU patients were divided into two groups according to DR severity and compared. […] Out of 100 DFU patients, 90 patients (90%) had DR and 55 (55%) had proliferative DR (PDR). There was no significant association between DR and DFU severities (R = 0.034, p = 0.734). A multivariable analysis comparing type 2 diabetic patients with and without DFUs showed that the presence of DR [OR, 226.12; 95% confidence interval (CI), 58.07–880.49; p < 0.001] and proliferative DR [OR, 306.27; 95% CI, 64.35–1457.80; p < 0.001), higher HbA1c (%, OR, 1.97, 95% CI, 1.46–2.67; p < 0.001), higher serum creatinine (mg/dL, OR, 1.62, 95% CI, 1.06–2.50; p = 0.027), older age (years, OR, 1.12; 95% CI, 1.06–1.17; p < 0.001), higher pulse pressure (mmHg, OR, 1.03; 95% CI, 1.00–1.06; p = 0.025), lower cholesterol (mg/dL, OR, 0.94; 95% CI, 0.92–0.97; p < 0.001), lower BMI (kg/m2, OR, 0.87, 95% CI, 0.75–1.00; p = 0.044) and lower hematocrit (%, OR, 0.80, 95% CI, 0.74–0.87; p < 0.001) were associated with DFUs. In a subgroup analysis of DFU patients, the PDR group had a longer duration of diabetes mellitus, higher serum BUN, and higher serum creatinine than the non-PDR group. In the multivariable analysis, only higher serum creatinine was associated with PDR in DFU patients (OR, 1.37; 95% CI, 1.05–1.78; p = 0.021).

Conclusions

Diabetic retinopathy is prevalent in patients with DFU and about half of DFU patients had PDR. No significant association was found in terms of the severity of these two diabetic complications. To prevent blindness, patients with DFU, and especially those with high serum creatinine, should undergo retinal examinations for timely PDR diagnosis and management.”

Advertisements

August 29, 2018 Posted by | Diabetes, Epidemiology, Genetics, Medicine, Molecular biology, Nephrology, Ophthalmology, Statistics, Studies | Leave a comment

Combinatorics (II)

I really liked this book. Below I have added some links and quotes related to the second half of the book’s coverage.

“An n × n magic square, or a magic square of order n, is a square array of numbers — usually (but not necessarily) the numbers from 1 to n2 — arranged in such a way that the sum of the numbers in each of the n rows, each of the n columns, or each of the two main diagonals is the same. A semi-magic square is a square array in which the sum of the numbers in each row or column, but not necessarily the diagonals, is the same. We note that if the entries are 1 to n2, then the sum of the numbers in the whole array is
1 + 2 + 3 + … + n2n2 (n2 + 1) / 2
on summing the arithmetic progression. Because the n rows and columns have the same ‘magic sum’, the numbers in each single row or column add up to (1/n)th of this, which is n (n2+1) / 2 […] An nn latin squareor a latin square of order n, is a square array with n symbols arranged so that each symbol appears just once in each row and column. […] Given a latin square, we can obtain others by rearranging the rows or the columns, or by permuting the symbols. For an n × n latin square with symbols 1, 2, … , n, we can thereby arrange that the numbers in the first row and the first column appear in order as 1, 2, … , n. Such a latin square is called normalized […] A familiar form of latin square is the sudoku puzzle […] How many n x n latin squares are there for a given order of n? The answer is known only for n ≤ 11. […] The number of normalized latin squares of order 11 has an impressive forty-eight digits.”

“A particular type of latin square is the cyclic square, where the symbols appear in the same cyclic order, moving one place to the left in each successive row, so that the entry at the beginning of each line appears at the end of the next one […] An extension of this idea is where the symbols move more places to the left in each successive row […] We can construct a latin square row by row from its first row, always taking care that no symbol appears twice in any column. […] An important concept […] is that of a set of orthogonal latin squares […] two n × n latin squares are orthogonal if, when superimposed, each of the n2 possible pairings of a symbol from each square appears exactly once. […] pairs of orthogonal latin squares are […] used in agricultural experiments. […] We can extend the idea of orthogonality beyond pairs […] A set of mutually orthogonal latin squares (sometimes abbreviated to MOLS) is a set of latin squares, any two of which are orthogonal […] Note that there can be at most n-1 MOLS of order n. […] A full set of MOLS is called a complete set […] We can ask the following question: For which values of n does there exist a complete set of n × n mutually orthogonal latin squares? As several authors have shown, a complete set exists whenever n is a prime number (other than 2) or a power of a prime […] In 1922, H. L. MacNeish generalized this result by observing that if n has prime factorization p, then the number of MOLS is at least min (p1a x p2b, … , pkz) – 1″.

“Consider the following [problem] involving comparisons between a number of varieties of a commodity: A consumer organization wishes to compare seven brands of detergent and arranges a number of tests. But since it may be uneconomic or inconvenient for each tester to compare all seven brands it is decided that each tester should compare just three brands. How should the trials be organized if each brand is to be tested the same number of times and each pair of brands is to be compared directly? […] A block design consists of a set of v varieties arranged into b blocks. […] [if we] further assume that each block contains the same number k of varieties, and each variety appears in the same number r of blocks […] [the design is] called [an] equireplicate design […] for every block design we have v x r = b x k. […] It would clearly be preferable if all pairs of varieties in a design were compared the same number of times […]. Such a design is called balanced, or a balanced incomplete-block design (often abbreviated to BIBD). The number of times that any two varieties are compared is usually denoted by λ […] In a balanced block design the parameters v, b, k, r, and λ are not independent […] [Rather it is the case that:] r x (k -1) = λ x (v – 1). […] The conditions v x r = b x k and r x (k -1) = λ x (v – 1) are both necessary for a design to be balanced, but they’re not sufficient since there are designs satisfying both conditions which are not balanced. Another necessary condition for a design to be balanced is v ≤ b, a result known as Fisher’s inequality […] A balanced design for which v = b, and therefore k = r, is called a symmetric design“.

“A block design with v varieties is resolvable if its blocks can be rearranged into subdesigns, called replicates, each of which contains every variety just once. [….] we define a finite projective plane to be an arrangement of a finite number of points and a finite number of lines with the properties that: [i] Any two points lie on exactly one line. [ii] Any two lines pass through exactly one point.
Note that this differs from our usual Euclidean geometry, where any two lines pass through exactly one point unless they’re parallel. Omitting these italicized words produces a completely different type of geometry from the one we’re used to, since there’s now a ‘duality’ or symmetry between points and lines, according to which any statement about points lying on lines gives rise to a statement about lines passing through points, and vice versa. […] We say that the finite projective plane has order n if each line contains n + 1 points. […] removing a single line from a projective plane of order n, and the n + 1 points on this line, gives a square pattern with n2 points and n2 + n lines where each line contains n points and each point lies on n + 1 lines. Such a diagram is called an affine plane of order n. […] This process is reversible. If we start with an affine plane of order n and add another line joined up appropriately, we get a projective plane of order n. […] Every finite projective plane gives rise to a symmetric balanced design. […] In general, a finite projective plane of order n, with n2 + n + 1 points and lines and with n + 1 points on each line and n + 1 lines through each point, gives rise to a balanced symmetric design with parameters v = b = n2 + n + 1, k = r = n + 1, and λ = 1. […] Every finite affine plane gives rise to a resolvable design. […] In general, an affine plane of order n, obtained by removing a line and n + 1 points from a projective plane of order n, gives rise to a resolvable design with parameters v = n2 , b = n2 + n , k = n , and r = n + 1. […] Every finite affine plane corresponds to a complete set of orthogonal latin squares.”

Links:

Regular polygon.
Polyhedron.
Internal and external angles.
Triangular tiling. Square tiling. Hexagonal tiling.
Semiregular tessellations.
Penrose tiling.
Platonic solid.
Euler’s polyhedron formula.
Prism (geometry). Antiprism.
Fullerene.
Geodesic dome.
Graph theory.
Complete graph. Complete bipartite graph. Cycle graph.
Degree (graph theory).
Handshaking lemma.
Ramsey theory.
Tree (graph theory).
Eulerian and Hamiltonian Graphs. Hamiltonian path.
Icosian game.
Knight’s tour problem.
Planar graph. Euler’s formula for plane graphs.
Kuratowski’s theorem.
Dual graph.
Lo Shu Square.
Melencolia I.
Euler’s Thirty-six officers problem.
Steiner triple system.
Partition (number theory).
Pentagonal number. Pentagonal number theorem.
Ramanujan’s congruences.

August 23, 2018 Posted by | Books, Mathematics, Statistics | Leave a comment

Some observations on a cryptographic problem

It’s been a long time since I last posted one of these sort of ‘rootless’ posts which are not based on a specific book or a specific lecture or something along those lines, but a question on r/science made me think about these topics and start writing a bit about it, and I decided I might as well add my thoughts and ideas here.

The reddit question which motivated me to write this post was this one: “Is it difficult to determine the password for an encryption if you are given both the encrypted and unencrypted message?

By “difficult” I mean requiring an inordinate amount of computation. If given both an encrypted and unencrypted file/message, is it reasonable to be able to recover the password that was used to encrypt the file/message?”

Judging from the way the question is worded, the inquirer obviously knows very little about these topics, but that was part of what motivated me when I started out writing; s/he quite obviously has a faulty model of how this kind of stuff actually works, and just by virtue of the way he or she asks his/her question s/he illustrates some ways in which s/he gets things wrong.

When I decided to transfer my efforts towards discussing these topics to the blog I also implicitly decided against using language that would be expected to be easily comprehensible for the original inquirer, as s/he was no longer in the target group and there’s a cost to using that kind of language when discussing technical matters. I’ve sort of tried to make this post both useful and readable to people not all that familiar with the related fields, but I tend to find it difficult to evaluate the extent to which I’ve succeeded when I try to do things like that.

I decided against adding stuff already commented on when I started out writing this, so I’ll not e.g. repeat noiwontfixyourpc’s reply below. However I have added some other observations that seem to me to be relevant and worth mentioning to people who might consider asking a similar question to the one the original inquirer asked in that thread:

i. Finding a way to make plaintext turn into cipher text (…or cipher text into plaintext; and no, these two things are not actually always equivalent, see below…) is a very different (and in many contexts a much easier problem) than finding out the actual encryption scheme that is at work producing the text strings you observe. There can be many, many different ways to go from a specific sample of plaintext to a specific sample of ciphertext, and most of the solutions won’t work if you’re faced with a new piece of ciphertext; especially not if the original samples are small, so only a small amount of (potential) information would be expected to be included in the text strings.

If you only get a small amount of plaintext and corresponding cipher text you may decide that algorithm A is the one that was applied to the message, even if the algorithm actually applied was a more complex algorithm, B. To illustrate in a very simple way how this might happen, A might be a particular case of B, because B is a superset of A and a large number of other potential encryption algorithms applied in the encryption scheme B (…or the encryption scheme C, because B also happens to be a subset of C, or… etc.). In such a context A might be an encryption scheme/approach that perhaps only applies in very specific contexts; for example (part of) the coding algorithm might have been to decide that ‘on next Tuesday, we’ll use this specific algorithm to translate plaintext into cipher text, and we’ll never use that specific translation-/mapping algorithm (which may be but one component of the encryption algorithm) again’. If such a situation applies then you’re faced with the problem that even if your rule ‘worked’ in that particular instance, in terms of translating your plaintext into cipher text and vice versa, it only ‘worked’ because you blindly fitted the two data-sets in a way that looked right, even if you actually had no idea how the coding scheme really worked (you only guessed A, not B, and in this particular instance A’s never actually going to happen again).

On a more general level some of the above comments incidentally in my view quite obviously links to results from classical statistics; there are many ways to link random variables through data fitting methods, but reliably identifying proper causal linkages through the application of such approaches is, well, difficult (and, according to some, often ill-advised)…

ii. In my view, it does not seem possible in general to prove that any specific proposed encryption/decryption algorithm is ‘the correct one’. This is because the proposed algorithm will never be a unique solution to the problem you’re evaluating. How are you going to convince me that The True Algorithm is not a more general/complex one (or perhaps a completely different one – see iii. below) than the one you propose, and that your solution is not missing relevant variables? The only way to truly test if the proposed algorithm is a valid algorithm is to test it on new data and compare its performance on this new data set with the performances of competing variables/solution proposals which also managed to correctly link cipher text and plaintext. If the algorithm doesn’t work on the new data, you got it wrong. If it does work on new data, well, you might still just have been lucky. You might get more confident with more correctly-assessed (…guessed?) data, but you never get certain. In other similar contexts a not uncommon approach for trying to get around these sorts of problems is to limit the analysis to a subset of the data available in order to obtain the algorithm, and then using the rest of the data for validation purposes (here’s a relevant link), but here even with highly efficient estimation approaches you almost certainly will run out of information (/degrees of freedom) long before you get anywhere if the encryption algorithm is at all non-trivial. In these settings information is likely to be a limiting resource.

iii. There are many different types of encryption schemes, and people who ask questions like the one above tend, I believe, to have a quite limited view of which methods and approaches are truly available to one who desires secrecy when exchanging information with others. Imagine a situation where the plaintext is ‘See you next Wednesday’ and the encrypted text is an English translation of Tolstoy’s book War and Peace (or, to make it even more fun, all pages published on the English version of Wikipedia, say on November the 5th, 2017 at midnight GMT). That’s an available encryption approach that might be applied. It might be a part (‘A’) of a more general (‘B’) encryption approach of linking specific messages from a preconceived list of messages, which had been considered worth sending in the future when the algorithm was chosen, to specific book titles decided on in advance. So if you want to say ‘good Sunday!’, Eve gets to read the Bible and see where that gets her. You could also decide that in half of all cases the book cipher text links to specific messages from a list but in the other half of the cases what you actually mean to communicate is on page 21 of the book; this might throw a hacker who saw a combined cipher text and plaintext combination resulting from that part of the algorithm off in terms of the other half, and vice versa – and it illustrates well one of the key problems you’re faced with as an attacker when working on cryptographic schemes about which you have limited knowledge; the opponent can always add new layers on top of the ones that already exist/apply to make the problem harder to solve. And so you could also link the specific list message with some really complicated cipher-encrypted version of the Bible. There’s a lot more to encryption schemes than just exchanging a few letters here and there. On related topics, see this link. On a different if related topic, people who desire secrecy when exchanging information may also attempt to try to hide the fact that any secrets are exchanged in the first place. See also this.

iv. The specific usage of the word ‘password’ in the original query calls for comment for multiple reasons, some of which have been touched upon above, perhaps mainly because it implicitly betrays a lack of knowledge about how modern cryptographic systems actually work. The thing is, even if you might consider an encryption scheme to just be an advanced sort of ‘password’, finding the password (singular) is not always the task you’re faced with today. In symmetric-key algorithm settings you might sort-of-kind-of argue that it sort-of is – in such settings you might say that you have one single (collection of) key(s) which you use to encrypt messages and also use to decrypt the messages. So you can both encrypt and decrypt the message using the same key(s), and so you only have one ‘password’. That’s however not how asymmetric-key encryption works. As wiki puts it: “In an asymmetric key encryption scheme, anyone can encrypt messages using the public key, but only the holder of the paired private key can decrypt.”

This of course relates to what you actually want to do/achieve when you get your samples of cipher text and plaintext. In some cryptographic contexts by design the route you need to to go to get from cipher text to plaintext is conceptually different from the route you need to go to get from plaintext to cipher text. And some of the ‘passwords’ that relate to how the schemes work are public knowledge by design.

v. I have already touched a bit upon the problem of the existence of an information constraint, but I realized I probably need to spell this out in a bit more detail. The original inquirer to me seems implicitly to be under the misapprehension that computational complexity is the only limiting constraint here (“By “difficult” I mean requiring an inordinate amount of computation.”). Given the setting he or she proposes, I don’t think that’s true, and why that is is sort of interesting.

If you think about what kind of problem you’re facing, what you have here in this setting is really a very limited amount of data which relates in an unknown manner to an unknown data-generating process (‘algorithm’). There are, as has been touched upon, in general many ways to obtain linkage between two data sets (the cipher text and the plaintext) using an algorithm – too many ways for comfort, actually. The search space is large, there are too many algorithms to consider; or equivalently, the amount of information supplied by the data will often be too small for us to properly evaluate the algorithms under consideration. An important observation is that more complex algorithms will both take longer to calculate (‘identify’ …at least as candidates) and be expected to require more data to evaluate, at least to the extent that algorithmic complexity constrains the data (/relates to changes in data structure/composition that needs to be modeled in order to evaluate/identify the goal algorithm). If the algorithm says a different encryption rule is at work on Wednesdays, you’re going to have trouble figuring that out if you only got hold of a cipher text/plaintext combination derived from an exchange which took place on a Saturday. There are methods from statistics that might conceivably help you deal with problems like these, but they have their own issues and trade-offs. You might limit yourself to considering only settings where you have access to all known plaintext and cipher text combinations, so you got both Wednesday and Saturday, but even here you can’t be safe – next (metaphorical, I probably at this point need to add) Friday might be different from last (metaphorical) Friday, and this could even be baked into the algorithm in very non-obvious ways.

The above remarks might give you the idea that I’m just coming up with these kinds of suggestions to try to foil your approaches to figuring out the algorithm ‘by cheating’ (…it shouldn’t matter whether or not it was ‘sent on a Saturday’), but the main point is that a complex encryption algorithm is complex, and even if you see it applied multiple times you might not get enough information about how it works from the data suggested to be able to evaluate if you guessed right. In fact, given a combination of a sparse data set (one message, or just a few messages, in plaintext and cipher text) and a complex algorithm involving a very non-obvious mapping function, the odds are strongly against you.

vi. I had the thought that one reason why the inquirer might be confused about some of these things is that s/he might well be aware of the existence of modern cryptographic techniques which do rely to a significant extent on computational complexity aspects. I.e., here you do have settings where you’re asked to provide ‘the right answer’ (‘the password’), but it’s hard to calculate the right answer in a reasonable amount of time unless you have the relevant (private) information at hand – see e.g. these links for more. One way to think about how such a problem relates to the other problem at hand (you have been presented with samples of cipher text and plaintext and you want to guess all the details about how the encryption and decryption schemes which were applied work) is that this kind of algorithm/approach may be applied in combination with other algorithmic approaches to encrypt/decrypt the text you’re analyzing. A really tough prime factorization problem might for all we know be an embedded component of the cryptographic process that is applied to our text. We could call it A.

In such a situation we would definitely be in trouble because stuff like prime factorization is really hard and computationally complex, and to make matters worse just looking at the plaintext and the cipher text would not make it obvious to us that a prime factorization scheme had even been applied to the data. But a really important point is that even if such a tough problem was not present and even if only relatively less computationally demanding problems were involved, we almost certainly still just wouldn’t have enough information to break any semi-decent encryption algorithm based on a small sample of plaintext and cipher text. It might help a little bit, but in the setting contemplated by the inquirer a ‘faster computer’ (/…’more efficient decision algorithm’, etc.) can only help so much.

vii. Shannon and Kerckhoffs may have a point in a general setting, but in specific settings like this particular one I think it is well worth taking into account the implications of not having a (publicly) known algorithm to attack. As wiki notes (see the previous link), ‘Many ciphers are actually based on publicly known algorithms or are open source and so it is only the difficulty of obtaining the key that determines security of the system’. The above remarks were of course all based on an assumption that Eve does not here have the sort of knowledge about the encryption scheme applied that she in many cases today actually might have. There are obvious and well-known weaknesses associated with having security-associated components of a specific cryptographic scheme be independent of the key, but I do not see how it does not in this particular setting cause search space blow-up making the decision problem (did we actually guess right?) intractable in many cases. A key feature of the problem considered by the inquirer is that you here – unlike in many ‘guess the password-settings’ where for example a correct password will allow you access to an application or a document or whatever – do not get any feedback neither in the case where you guess right nor in the case where you guess wrong; it’s a decision problem, not a calculation problem. (However it is perhaps worth noting on the other hand that in a ‘standard guess-the-password-problem’ you may also sometimes implicitly face a similar decision problem due to e.g. the potential for a combination of cryptographic security and steganographic complementary strategies like e.g. these having been applied).

August 14, 2018 Posted by | Computer science, Cryptography, Data, rambling nonsense, Statistics | Leave a comment

Lyapunov Arguments in Optimization

I’d say that if you’re interested in the intersection of mathematical optimization methods/-algorithms and dynamical systems analysis it’s probably a talk well worth watching. The lecture is reasonably high-level and covers a fairly satisfactory amount of ground in a relatively short amount of time, and it is not particularly hard to follow if you have at least some passing familiarity with the fields involved (dynamical systems analysis, statistics, mathematical optimization, computer science/machine learning).

Some links:

Dynamical system.
Euler–Lagrange equation.
Continuous optimization problem.
Gradient descent algorithm.
Lyapunov stability.
Condition number.
Fast (/accelerated-) gradient descent methods.
The Mirror Descent Algorithm.
Cubic regularization of Newton method and its global performance (Nesterov & Polyak).
A Differential Equation for Modeling Nesterov’s Accelerated Gradient Method: Theory and Insights (Su, Boyd & Candès).
A Variational Perspective on Accelerated Methods in Optimization (Wibisono, Wilson & Jordan).
Breaking Locality Accelerates Block Gauss-Seidel (Tu, Venkataraman, Wilson, Gittens, Jordan & Recht).
A Lyapunov Analysis of Momentum Methods in Optimization (Wilson, Recht & Jordan).
Bregman divergence.
Estimate sequence methods.
Variance reduction techniques.
Stochastic gradient descent.
Langevin dynamics.

 

July 22, 2018 Posted by | Computer science, Lectures, Mathematics, Physics, Statistics | Leave a comment

Big Data (II)

Below I have added a few observation from the last half of the book, as well as some coverage-related links to topics of interest.

“With big data, using correlation creates […] problems. If we consider a massive dataset, algorithms can be written that, when applied, return a large number of spurious correlations that are totally independent of the views, opinions, or hypotheses of any human being. Problems arise with false correlations — for example, divorce rate and margarine consumption […]. [W]hen the number of variables becomes large, the number of spurious correlations also increases. This is one of the main problems associated with trying to extract useful information from big data, because in doing so, as with mining big data, we are usually looking for patterns and correlations. […] one of the reasons Google Flu Trends failed in its predictions was because of these problems. […] The Google Flu Trends project hinged on the known result that there is a high correlation between the number of flu-related online searches and visits to the doctor’s surgery. If a lot of people in a particular area are searching for flu-related information online, it might then be possible to predict the spread of flu cases to adjoining areas. Since the interest is in finding trends, the data can be anonymized and hence no consent from individuals is required. Using their five-year accumulation of data, which they limited to the same time-frame as the CDC data, and so collected only during the flu season, Google counted the weekly occurrence of each of the fifty million most common search queries covering all subjects. These search query counts were then compared with the CDC flu data, and those with the highest correlation were used in the flu trends model. […] The historical data provided a baseline from which to assess current flu activity on the chosen search terms and by comparing the new real-time data against this, a classification on a scale from 1 to 5, where 5 signified the most severe, was established. Used in the 2011–12 and 2012–13 US flu seasons, Google’s big data algorithm famously failed to deliver. After the flu season ended, its predictions were checked against the CDC’s actual data. […] the Google Flu Trends algorithm over-predicted the number of flu cases by at least 50 per cent during the years it was used.” [For more details on why blind/mindless hypothesis testing/p-value hunting on big data sets is usually a terrible idea, see e.g. Burnham & Anderson, US]

“The data Google used [in the Google Flu Trends algorithm], collected selectively from search engine queries, produced results [with] obvious bias […] for example by eliminating everyone who does not use a computer and everyone using other search engines. Another issue that may have led to poor results was that customers searching Google on ‘flu symptoms’ would probably have explored a number of flu-related websites, resulting in their being counted several times and thus inflating the numbers. In addition, search behaviour changes over time, especially during an epidemic, and this should be taken into account by updating the model regularly. Once errors in prediction start to occur, they tend to cascade, which is what happened with the Google Flu Trends predictions: one week’s errors were passed along to the next week. […] [Similarly,] the Ebola prediction figures published by WHO [during the West African Ebola virus epidemic] were over 50 per cent higher than the cases actually recorded. The problems with both the Google Flu Trends and Ebola analyses were similar in that the prediction algorithms used were based only on initial data and did not take into account changing conditions. Essentially, each of these models assumed that the number of cases would continue to grow at the same rate in the future as they had before the medical intervention began. Clearly, medical and public health measures could be expected to have positive effects and these had not been integrated into the model.”

“Every time a patient visits a doctor’s office or hospital, electronic data is routinely collected. Electronic health records constitute legal documentation of a patient’s healthcare contacts: details such as patient history, medications prescribed, and test results are recorded. Electronic health records may also include sensor data such as Magnetic Resonance Imaging (MRI) scans. The data may be anonymized and pooled for research purposes. It is estimated that in 2015, an average hospital in the USA will store over 600 Tb of data, most of which is unstructured. […] Typically, the human genome contains about 20,000 genes and mapping such a genome requires about 100 Gb of data. […] The interdisciplinary field of bioinformatics has flourished as a consequence of the need to manage and analyze the big data generated by genomics. […] Cloud-based systems give authorized users access to data anywhere in the world. To take just one example, the NHS plans to make patient records available via smartphone by 2018. These developments will inevitably generate more attacks on the data they employ, and considerable effort will need to be expended in the development of effective security methods to ensure the safety of that data. […] There is no absolute certainty on the Web. Since e-documents can be modified and updated without the author’s knowledge, they can easily be manipulated. This situation could be extremely damaging in many different situations, such as the possibility of someone tampering with electronic medical records. […] [S]ome of the problems facing big data systems [include] ensuring they actually work as intended, [that they] can be fixed when they break down, and [that they] are tamper-proof and accessible only to those with the correct authorization.”

“With transactions being made through sales and auction bids, eBay generates approximately 50 Tb of data a day, collected from every search, sale, and bid made on their website by a claimed 160 million active users in 190 countries. […] Amazon collects vast amounts of data including addresses, payment information, and details of everything an individual has ever looked at or bought from them. Amazon uses its data in order to encourage the customer to spend more money with them by trying to do as much of the customer’s market research as possible. In the case of books, for example, Amazon needs to provide not only a huge selection but to focus recommendations on the individual customer. […] Many customers use smartphones with GPS capability, allowing Amazon to collect data showing time and location. This substantial amount of data is used to construct customer profiles allowing similar individuals and their recommendations to be matched. Since 2013, Amazon has been selling customer metadata to advertisers in order to promote their Web services operation […] Netflix collects and uses huge amounts of data to improve customer service, such as offering recommendations to individual customers while endeavouring to provide reliable streaming of its movies. Recommendation is at the heart of the Netflix business model and most of its business is driven by the data-based recommendations it is able to offer customers. Netflix now tracks what you watch, what you browse, what you search for, and the day and time you do all these things. It also records whether you are using an iPad, TV, or something else. […] As well as collecting search data and star ratings, Netflix can now keep records on how often users pause or fast forward, and whether or not they finish watching each programme they start. They also monitor how, when, and where they watched the programme, and a host of other variables too numerous to mention.”

“Data science is becoming a popular study option in universities but graduates so far have been unable to meet the demands of commerce and industry, where positions in data science offer high salaries to experienced applicants. Big data for commercial enterprises is concerned with profit, and disillusionment will set in quickly if an over-burdened data analyst with insufficient experience fails to deliver the expected positive results. All too often, firms are asking for a one-size-fits-all model of data scientist who is expected to be competent in everything from statistical analysis to data storage and data security.”

“In December 2016, Yahoo! announced that a data breach involving over one billion user accounts had occurred in August 2013. Dubbed the biggest ever cyber theft of personal data, or at least the biggest ever divulged by any company, thieves apparently used forged cookies, which allowed them access to accounts without the need for passwords. This followed the disclosure of an attack on Yahoo! in 2014, when 500 million accounts were compromised. […] The list of big data security breaches increases almost daily. Data theft, data ransom, and data sabotage are major concerns in a data-centric world. There have been many scares regarding the security and ownership of personal digital data. Before the digital age we used to keep photos in albums and negatives were our backup. After that, we stored our photos electronically on a hard-drive in our computer. This could possibly fail and we were wise to have back-ups but at least the files were not publicly accessible. Many of us now store data in the Cloud. […] If you store all your photos in the Cloud, it’s highly unlikely with today’s sophisticated systems that you would lose them. On the other hand, if you want to delete something, maybe a photo or video, it becomes difficult to ensure all copies have been deleted. Essentially you have to rely on your provider to do this. Another important issue is controlling who has access to the photos and other data you have uploaded to the Cloud. […] although the Internet and Cloud-based computing are generally thought of as wireless, they are anything but; data is transmitted through fibre-optic cables laid under the oceans. Nearly all digital communication between continents is transmitted in this way. My email will be sent via transatlantic fibre-optic cables, even if I am using a Cloud computing service. The Cloud, an attractive buzz word, conjures up images of satellites sending data across the world, but in reality Cloud services are firmly rooted in a distributed network of data centres providing Internet access, largely through cables. Fibre-optic cables provide the fastest means of data transmission and so are generally preferable to satellites.”

Links:

Health care informatics.
Electronic health records.
European influenza surveillance network.
Overfitting.
Public Health Emergency of International Concern.
Virtual Physiological Human project.
Watson (computer).
Natural language processing.
Anthem medical data breach.
Electronic delay storage automatic calculator (EDSAC). LEO (computer). ICL (International Computers Limited).
E-commerce. Online shopping.
Pay-per-click advertising model. Google AdWords. Click fraud. Targeted advertising.
Recommender system. Collaborative filtering.
Anticipatory shipping.
BlackPOS Malware.
Data Encryption Standard algorithm. EFF DES cracker.
Advanced Encryption Standard.
Tempora. PRISM (surveillance program). Edward Snowden. WikiLeaks. Tor (anonymity network). Silk Road (marketplace). Deep web. Internet of Things.
Songdo International Business District. Smart City.
United Nations Global Pulse.

July 19, 2018 Posted by | Books, Computer science, Cryptography, Data, Engineering, Epidemiology, Statistics | Leave a comment

Big Data (I?)

Below a few observations from the first half of the book, as well as some links related to the topic coverage.

“The data we derive from the Web can be classified as structured, unstructured, or semi-structured. […] Carefully structured and tabulated data is relatively easy to manage and is amenable to statistical analysis, indeed until recently statistical analysis methods could be applied only to structured data. In contrast, unstructured data is not so easily categorized, and includes photos, videos, tweets, and word-processing documents. Once the use of the World Wide Web became widespread, it transpired that many such potential sources of information remained inaccessible because they lacked the structure needed for existing analytical techniques to be applied. However, by identifying key features, data that appears at first sight to be unstructured may not be completely without structure. Emails, for example, contain structured metadata in the heading as well as the actual unstructured message […] and so may be classified as semi-structured data. Metadata tags, which are essentially descriptive references, can be used to add some structure to unstructured data. […] Dealing with unstructured data is challenging: since it cannot be stored in traditional databases or spreadsheets, special tools have had to be developed to extract useful information. […] Approximately 80 per cent of the world’s data is unstructured in the form of text, photos, and images, and so is not amenable to the traditional methods of structured data analysis. ‘Big data’ is now used to refer not just to the total amount of data generated and stored electronically, but also to specific datasets that are large in both size and complexity, with which new algorithmic techniques are required in order to extract useful information from them.”

“In the digital age we are no longer entirely dependent on samples, since we can often collect all the data we need on entire populations. But the size of these increasingly large sets of data cannot alone provide a definition for the term ‘big data’ — we must include complexity in any definition. Instead of carefully constructed samples of ‘small data’ we are now dealing with huge amounts of data that has not been collected with any specific questions in mind and is often unstructured. In order to characterize the key features that make data big and move towards a definition of the term, Doug Laney, writing in 2001, proposed using the three ‘v’s: volume, variety, and velocity. […] ‘Volume’ refers to the amount of electronic data that is now collected and stored, which is growing at an ever-increasing rate. Big data is big, but how big? […] Generally, we can say the volume criterion is met if the dataset is such that we cannot collect, store, and analyse it using traditional computing and statistical methods. […] Although a great variety of data [exists], ultimately it can all be classified as structured, unstructured, or semi-structured. […] Velocity is necessarily connected with volume: the faster the data is generated, the more there is. […] Velocity also refers to the speed at which data is electronically processed. For example, sensor data, such as that generated by an autonomous car, is necessarily generated in real time. If the car is to work reliably, the data […] must be analysed very quickly […] Variability may be considered as an additional dimension of the velocity concept, referring to the changing rates in flow of data […] computer systems are more prone to failure [during peak flow periods]. […] As well as the original three ‘v’s suggested by Laney, we may add ‘veracity’ as a fourth. Veracity refers to the quality of the data being collected. […] Taken together, the four main characteristics of big data – volume, variety, velocity, and veracity – present a considerable challenge in data management.” [As regular readers of this blog might be aware, not everybody would agree with the author here about the inclusion of veracity as a defining feature of big data – “Many have suggested that there are more V’s that are important to the big data problem [than volume, variety & velocity] such as veracity and value (IEEE BigData 2013). Veracity refers to the trustworthiness of the data, and value refers to the value that the data adds to creating knowledge about a topic or situation. While we agree that these are important data characteristics, we do not see these as key features that distinguish big data from regular data. It is important to evaluate the veracity and value of all data, both big and small. (Knoth & Schmid)]

“Anyone who uses a personal computer, laptop, or smartphone accesses data stored in a database. Structured data, such as bank statements and electronic address books, are stored in a relational database. In order to manage all this structured data, a relational database management system (RDBMS) is used to create, maintain, access, and manipulate the data. […] Once […] the database [has been] constructed we can populate it with data and interrogate it using structured query language (SQL). […] An important aspect of relational database design involves a process called normalization which includes reducing data duplication to a minimum and hence reduces storage requirements. This allows speedier queries, but even so as the volume of data increases the performance of these traditional databases decreases. The problem is one of scalability. Since relational databases are essentially designed to run on just one server, as more and more data is added they become slow and unreliable. The only way to achieve scalability is to add more computing power, which has its limits. This is known as vertical scalability. So although structured data is usually stored and managed in an RDBMS, when the data is big, say in terabytes or petabytes and beyond, the RDBMS no longer works efficiently, even for structured data. An important feature of relational databases and a good reason for continuing to use them is that they conform to the following group of properties: atomicity, consistency, isolation, and durability, usually known as ACID. Atomicity ensures that incomplete transactions cannot update the database; consistency excludes invalid data; isolation ensures one transaction does not interfere with another transaction; and durability means that the database must update before the next transaction is carried out. All these are desirable properties but storing and accessing big data, which is mostly unstructured, requires a different approach. […] given the current data explosion there has been intensive research into new storage and management techniques. In order to store these massive datasets, data is distributed across servers. As the number of servers involved increases, the chance of failure at some point also increases, so it is important to have multiple, reliably identical copies of the same data, each stored on a different server. Indeed, with the massive amounts of data now being processed, systems failure is taken as inevitable and so ways of coping with this are built into the methods of storage.”

“A distributed file system (DFS) provides effective and reliable storage for big data across many computers. […] Hadoop DFS [is] one of the most popular DFS […] When we use Hadoop DFS, the data is distributed across many nodes, often tens of thousands of them, physically situated in data centres around the world. […] The NameNode deals with all requests coming in from a client computer; it distributes storage space, and keeps track of storage availability and data location. It also manages all the basic file operations (e.g. opening and closing files) and controls data access by client computers. The DataNodes are responsible for actually storing the data and in order to do so, create, delete, and replicate blocks as necessary. Data replication is an essential feature of the Hadoop DFS. […] It is important that several copies of each block are stored so that if a DataNode fails, other nodes are able to take over and continue with processing tasks without loss of data. […] Data is written to a DataNode only once but will be read by an application many times. […] One of the functions of the NameNode is to determine the best DataNode to use given the current usage, ensuring fast data access and processing. The client computer then accesses the data block from the chosen node. DataNodes are added as and when required by the increased storage requirements, a feature known as horizontal scalability. One of the main advantages of Hadoop DFS over a relational database is that you can collect vast amounts of data, keep adding to it, and, at that time, not yet have any clear idea of what you want to use it for. […] structured data with identifiable rows and columns can be easily stored in a RDBMS while unstructured data can be stored cheaply and readily using a DFS.”

NoSQL is the generic name used to refer to non-relational databases and stands for Not only SQL. […] The non-relational model has some features that are necessary in the management of big data, namely scalability, availability, and performance. With a relational database you cannot keep scaling vertically without loss of function, whereas with NoSQL you scale horizontally and this enables performance to be maintained. […] Within the context of a distributed database system, consistency refers to the requirement that all copies of data should be the same across nodes. […] Availability requires that if a node fails, other nodes still function […] Data, and hence DataNodes, are distributed across physically separate servers and communication between these machines will sometimes fail. When this occurs it is called a network partition. Partition tolerance requires that the system continues to operate even if this happens. In essence, what the CAP [Consistency, Availability, Partition Tolerance] Theorem states is that for any distributed computer system, where the data is shared, only two of these three criteria can be met. There are therefore three possibilities; the system must be: consistent and available, consistent and partition tolerant, or partition tolerant and available. Notice that since in a RDMS the network is not partitioned, only consistency and availability would be of concern and the RDMS model meets both of these criteria. In NoSQL, since we necessarily have partitioning, we have to choose between consistency and availability. By sacrificing availability, we are able to wait until consistency is achieved. If we choose instead to sacrifice consistency it follows that sometimes the data will differ from server to server. The somewhat contrived acronym BASE (Basically Available, Soft, and Eventually consistent) is used as a convenient way of describing this situation. BASE appears to have been chosen in contrast to the ACID properties of relational databases. ‘Soft’ in this context refers to the flexibility in the consistency requirement. The aim is not to abandon any one of these criteria but to find a way of optimizing all three, essentially a compromise. […] The name NoSQL derives from the fact that SQL cannot be used to query these databases. […] There are four main types of non-relational or NoSQL database: key-value, column-based, document, and graph – all useful for storing large amounts of structured and semi-structured data. […] Currently, an approach called NewSQL is finding a niche. […] the aim of this latent technology is to solve the scalability problems associated with the relational model, making it more useable for big data.”

“A popular way of dealing with big data is to divide it up into small chunks and then process each of these individually, which is basically what MapReduce does by spreading the required calculations or queries over many, many computers. […] Bloom filters are particularly suited to applications where storage is an issue and where the data can be thought of as a list. The basic idea behind Bloom filters is that we want to build a system, based on a list of data elements, to answer the question ‘Is X in the list?’ With big datasets, searching through the entire set may be too slow to be useful, so we use a Bloom filter which, being a probabilistic method, is not 100 per cent accurate—the algorithm may decide that an element belongs to the list when actually it does not; but it is a fast, reliable, and storage efficient method of extracting useful knowledge from data. Bloom filters have many applications. For example, they can be used to check whether a particular Web address leads to a malicious website. In this case, the Bloom filter would act as a blacklist of known malicious URLs against which it is possible to check, quickly and accurately, whether it is likely that the one you have just clicked on is safe or not. Web addresses newly found to be malicious can be added to the blacklist. […] A related example is that of malicious email messages, which may be spam or may contain phishing attempts. A Bloom filter provides us with a quick way of checking each email address and hence we would be able to issue a timely warning if appropriate. […] they can [also] provide a very useful way of detecting fraudulent credit card transactions.”

Links:

Data.
Punched card.
Clickstream log.
HTTP cookie.
Australian Square Kilometre Array Pathfinder.
The Millionaire Calculator.
Data mining.
Supervised machine learning.
Unsupervised machine learning.
Statistical classification.
Cluster analysis.
Moore’s Law.
Cloud storage. Cloud computing.
Data compression. Lossless data compression. Lossy data compression.
ASCII. Huffman algorithm. Variable-length encoding.
Data compression ratio.
Grayscale.
Discrete cosine transform.
JPEG.
Bit array. Hash function.
PageRank algorithm.
Common crawl.

July 14, 2018 Posted by | Books, Computer science, Data, Statistics | Leave a comment

Frontiers in Statistical Quality Control (I)

The XIth International Workshop on Intelligent Statistical Quality Control took place in Sydney, Australia from August 20 to August 23, 2013. […] The 23 papers in this volume were carefully selected by the scientific program committee, reviewed by its members, revised by the authors and, finally, adapted by the editors for this volume. The focus of the book lies on three major areas of statistical quality control: statistical process control (SPC), acceptance sampling and design of experiments. The majority of the papers deal with statistical process control while acceptance sampling, and design of experiments are treated to a lesser extent.”

I’m currently reading this book. It’s quite technical and a bit longer than many of the other non-fiction books I’ve read this year (…but shorter than others; however it is still ~400 pages of content exclusively devoted to statistical papers), so it may take me a while to finish it. I figured the fact that I may not finish the book in a while was not a good argument against blogging relevant sections of the book now, especially as it’s already been some time since I read the first few chapters.

When reading a book like this one I care a lot more about understanding the concepts than about understanding the proofs, so as usual the amount of math included in the post is limited; please don’t assume it’s because there are no equations in the book.

Below I have added some ideas and observations from the first 100 pages or so of the book’s coverage.

“A growing number of [statistical quality control] applications involve monitoring with rare event data. […] The most common approaches for monitoring such processes involve using an exponential distribution to model the time between the events or using a Bernoulli distribution to model whether or not each opportunity for the event results in its occurrence. The use of a sequence of independent Bernoulli random variables leads to a geometric distribution for the number of non-occurrences between the occurrences of the rare events. One surveillance method is to use a power transformation on the exponential or geometric observations to achieve approximate normality of the in control distribution and then use a standard individuals control chart. We add to the argument that use of this approach is very counterproductive and cover some alternative approaches. We discuss the choice of appropriate performance metrics. […] Most often the focus is on detecting process deterioration, i.e., an increase in the probability of the adverse event or a decrease in the average time between events. Szarka and Woodall (2011) reviewed the extensive number of methods that have been proposed for monitoring processes using Bernoulli data. Generally, it is difficult to better the performance of the Bernoulli cumulative sum (CUSUM) chart of Reynolds and Stoumbos (1999). The Bernoulli and geometric CUSUM charts can be designed to be equivalent […] Levinson (2011) argued that control charts should not be used with healthcare rare event data because in many situations there is an assignable cause for each error, e.g., each hospital-acquired infection or serious prescription error, and each incident should be investigated. We agree that serious adverse events should be investigated whether or not they result in a control chart signal. The investigation of rare adverse events, however, and the implementation of process improvements to prevent future such errors, does not preclude using a control chart to determine if the rate of such events has increased or decreased over time. In fact, a control chart can be used to evaluate the success of any process improvement initiative.”

“The choice of appropriate performance metrics for comparing surveillance schemes for monitoring Bernoulli and exponential data is quite important. The usual Average Run Length (ARL) metric refers to the average number of points plotted on the chart until a signal is given. This metric is most clearly appropriate when the time between the plotted points is constant. […] In some cases, such as in monitoring the number of near-miss accidents, it may be informative to use a metric that reflects the actual time required to obtain an out-of-control signal. Thus one can consider the number of Bernoulli trials until an out-of-control signal is given for Bernoulli data, leading to its average, the ANOS. The ANOS will be proportional to the average time before a signal if the rate at which the Bernoulli trials are observed is constant over time. For exponentially distributed data one could consider the average time to signal, the ATS. If the process is stable, then ANOS = ARL / p and ATS = ARS * θ, where p and θ are the Bernoulli probability and the exponential mean, respectively. […] To assess out-of-control performance we believe it is most realistic to consider steady-state performance where the shift in the parameter occurs at some time after monitoring has begun. […] Under this scenario one cannot easily convert the ARL metric to the ANOS and ATS metrics. Consideration of steady state performance of competing methods is important because some methods have an implicit headstart feature that results in good zero-state performance, but poor steady-state performance.”

“Data aggregation is frequently done when monitoring rare events and for count data generally. For example, one might monitor the number of accidents per month in a plant or the number of patient falls per week in a hospital. […] Schuh et al. (2013) showed […] that there can be significantly long expected delays in detecting process deterioration when data are aggregated over time even when there are few samples with zero events. One can always aggregate data over long enough time periods to avoid zero counts, but the consequence is slower detection of increases in the rate of the adverse event. […] aggregating event data over fixed time intervals, as frequently done in practice, can result in significant delays in detecting increases in the rate of adverse events. […] Another type of aggregation is to wait until one has observed a given number of events before updating a control chart based on a proportion or waiting time. […] This type of aggregation […] does not appear to delay the detection of process changes nearly as much as aggregating data over fixed time periods. […] We believe that the adverse effect of aggregating data over time has not been fully appreciated in practice and more research work is needed on this topic. Only a couple of the most basic scenarios for count data have been studied. […] Virtually all of the work on monitoring the rate of rare events is based on the assumption that there is a sustained shift in the rate. In some applications the rate change may be transient. In this scenario other performance metrics would be needed, such as the probability of detecting the process shift during the transient period. The effect of data aggregation over time might be larger if shifts in the parameter are not sustained.”

Big data is a popular term that is used to describe the large, diverse, complex and/or longitudinal datasets generated from a variety of instruments, sensors and/or computer-based transactions. […] The acquisition of data does not automatically transfer to new knowledge about the system under study. […] To be able to gain knowledge from big data, it is imperative to understand both the scale and scope of big data. The challenges with processing and analyzing big data are not only limited to the size of the data. These challenges include the size, or volume, as well as the variety and velocity of the data (Zikopoulos et al. 2012). Known as the 3V’s, the volume, variety, and/or velocity of the data are the three main characteristics that distinguish big data from the data we have had in the past. […] Many have suggested that there are more V’s that are important to the big data problem such as veracity and value (IEEE BigData 2013). Veracity refers to the trustworthiness of the data, and value refers to the value that the data adds to creating knowledge about a topic or situation. While we agree that these are important data characteristics, we do not see these as key features that distinguish big data from regular data. It is important to evaluate the veracity and value of all data, both big and small. Both veracity and value are related to the concept of data quality, an important research area in the Information Systems (IS) literature for more than 50 years. The research literature discussing the aspects and measures of data quality is extensive in the IS field, but seems to have reached a general agreement that the multiple aspects of data quality can be grouped into several broad categories […]. Two of the categories relevant here are contextual and intrinsic dimensions of data quality. Contextual aspects of data quality are context specific measures that are subjective in nature, including concepts like value-added, believability, and relevance. […] Intrinsic aspects of data quality are more concrete in nature, and include four main dimensions: accuracy, timeliness, consistency, and completeness […] From our perspective, many of the contextual and intrinsic aspects of data quality are related to the veracity and value of the data. That said, big data presents new challenges in conceptualizing, evaluating, and monitoring data quality.”

The application of SPC methods to big data is similar in many ways to the application of SPC methods to regular data. However, many of the challenges inherent to properly studying and framing a problem can be more difficult in the presence of massive amounts of data. […] it is important to note that building the model is not the end-game. The actual use of the analysis in practice is the goal. Thus, some consideration needs to be given to the actual implementation of the statistical surveillance applications. This brings us to another important challenge, that of the complexity of many big data applications. SPC applications have a tradition of back of the napkin methods. The custom within SPC practice is the use of simple methods that are easy to explain like the Shewhart control chart. These are often the best methods to use to gain credibility because they are easy to understand and easy to explain to a non-statistical audience. However, big data often does not lend itself to easy-to-compute or easy-to-explain methods. While a control chart based on a neural net may work well, it may be so difficult to understand and explain that it may be abandoned for inferior, yet simpler methods. Thus, it is important to consider the dissemination and deployment of advanced analytical methods in order for them to be effectively used in practice. […] Another challenge in monitoring high dimensional data sets is the fact that not all of the monitored variables are likely to shift at the same time; thus, some method is necessary to identify the process variables that have changed. In high dimensional data sets, the decomposition methods used with multivariate control charts can become very computationally expensive. Several authors have considered variable selection methods combined with control charts to quickly detect process changes in a variety of practical scenarios including fault detection, multistage processes, and profile monitoring. […] All of these methods based on variable selection techniques are based on the idea of monitoring subsets of potentially faulty variables. […] Some variable reduction methods are needed to better identify shifts. We believe that further work in the areas combining variable selection methods and surveillance are important for quickly and efficiently diagnosing changes in high-dimensional data.

“A multiple stream process (MSP) is a process that generates several streams of output. From the statistical process control standpoint, the quality variable and its specifications are the same in all streams. A classical example is a filling process such as the ones found in beverage, cosmetics, pharmaceutical and chemical industries, where a filler machine may have many heads. […] Although multiple-stream processes are found very frequently in industry, the literature on schemes for the statistical control of such kind of processes is far from abundant. This paper presents a survey of the research on this topic. […] The first specific techniques for the statistical control of MSPs are the group control charts (GCCs) […] Clearly the chief motivation for these charts was to avoid the proliferation of control charts that would arise if every stream were controlled with a separate pair of charts (one for location and other for spread). Assuming the in-control distribution of the quality variable to be the same in all streams (an assumption which is sometimes too restrictive), the control limits should be the same for every stream. So, the basic idea is to build only one chart (or a pair of charts) with the information from all streams.”

“The GCC will work well if the values of the quality variable in the different streams are independent and identically distributed, that is, if there is no cross-correlation between streams. However, such an assumption is often unrealistic. In many real multiple-stream processes, the value of the observed quality variable is typically better described as the sum of two components: a common component (let’s refer to it as “mean level”), exhibiting variation that affects all streams in the same way, and the individual component of each stream, which corresponds to the difference between the stream observation and the common mean level. […] [T]he presence of the mean level component leads to reduced sensitivity of Boyd’s GCC to shifts in the individual component of a stream if the variance […] of the mean level is large with respect to the variance […] of the individual stream components. Moreover, the GCC is a Shewhart-type chart; if the data exhibit autocorrelation, the traditional form of estimating the process standard deviation (for establishing the control limits) based on the average range or average standard deviation of individual samples (even with the Bonferroni or Dunn-Sidak correction) will result in too frequent false alarms, due to the underestimation of the process total variance. […] [I]in the converse situation […] the GCC will have little sensitivity to causes that affect all streams — at least, less sensitivity than would have a chart on the average of the measurements across all streams, since this one would have tighter limits than the GCC. […] Therefore, to monitor MSPs with the two components described, Mortell and Runger (1995) proposed using two control charts: First, a chart for the grand average between streams, to monitor the mean level. […] For monitoring the individual stream components, they proposed using a special range chart (Rt chart), whose statistic is the range between streams, that is, the difference between the largest stream average and the smallest stream average […] the authors commented that both the chart on the average of all streams and the Rt chart can be used even when at each sampling time only a subset of the streams are sampled (provided that the number of streams sampled remains constant). The subset can be varied periodically or even chosen at random. […] it is common in practice to measure only a subset of streams at each sampling time, especially when the number of streams is large. […] Although almost the totality of Mortell and Runger’s paper is about the monitoring of the individual streams, the importance of the chart on the average of all streams for monitoring the mean level of the process cannot be overemphasized.”

“Epprecht and Barros (2013) studied a filling process application where the stream variances were similar, but the stream means differed, wandered, changed from day to day, were very difficult to adjust, and the production runs were too short to enable good estimation of the parameters of the individual streams. The solution adopted to control the process was to adjust the target above the nominal level to compensate for the variation between streams, as a function of the lower specification limit, of the desired false-alarm rate and of a point (shift, power) arbitrarily selected. This would be a MSP version of “acceptance control charts” (Montgomery 2012, Sect. 10.2) if taking samples with more than one observation per stream [is] feasible.”

Most research works consider a small to moderate number of streams. Some processes may have hundreds of streams, and in this case the issue of how to control the false-alarm rate while keeping enough detection power […] becomes a real problem. […] Real multiple-stream processes can be very ill-behaved. The author of this paper has seen a plant with six 20-stream filling processes in which the stream levels had different means and variances and could not be adjusted separately (one single pump and 20 hoses). For many real cases with particular twists like this one, it happens that no previous solution in the literature is applicable. […] The appropriateness and efficiency of [different monitoring methods] depends on the dynamic behaviour of the process over time, on the degree of cross-correlation between streams, on the ratio between the variabilities of the individual streams and of the common component (note that these three factors are interrelated), on the type and size of shifts that are likely and/or relevant to detect, on the ease or difficulty to adjust all streams in the same target, on the process capability, on the number of streams, on the feasibility of taking samples of more than one observation per stream at each sampling time (or even the feasibility of taking one observation of every stream at each sampling time!), on the length of the production runs, and so on. So, the first problem in a practical application is to characterize the process and select the appropriate monitoring scheme (or to adapt one, or to develop a new one). This analysis may not be trivial for the average practitioner in industry. […] Jirasettapong and Rojanarowan (2011) is the only work I have found on the issue of selecting the most suitable monitoring scheme for an MSP. It considers only a limited number of alternative schemes and a few aspects of the problem. More comprehensive analyses are needed.”

June 27, 2018 Posted by | Books, Data, Engineering, Statistics | Leave a comment

Alcohol and Aging (II)

I gave the book 3 stars on goodreads.

As is usual for publications of this nature, the book includes many chapters that cover similar topics and so the coverage can get a bit repetitive if you’re reading it from cover to cover the way I did; most of the various chapter authors obviously didn’t read the other contributions included in the book, and as each chapter is meant to stand on its own you end up with a lot of chapter introductions which cover very similar topics. If you can disregard such aspects it’s a decent book, which covers a wide variety of topics.

Below I have added some observations from some of the chapters of the book which I did not cover in my first post.

It is widely accepted that consuming heavy amounts of alcohol and binge drinking are detrimental to the brain. Animal studies that have examined the anatomical changes that occur to the brain as a consequence of consuming alcohol indicate that heavy alcohol consumption and binge drinking leads to the death of existing neurons [10, 11] and prevents production of new neurons [12, 13]. […] While animal studies indicate that consuming even moderate amounts of alcohol is detrimental to the brain, the evidence from epidemiological studies is less clear. […] Epidemiological studies that have examined the relationship between late life alcohol consumption and cognition have frequently reported that older adults who consume light to moderate amounts of alcohol are less likely to develop dementia and have higher cognitive functioning compared to older adults who do not consume alcohol. […] In a meta-analysis of 15 prospective cohort studies, consuming light to moderate amounts of alcohol was associated with significantly lower relative risk (RR) for Alzheimer’s disease (RR=0.72, 95% CI=0.61–0.86), vascular dementia (RR=0.75, 95% CI=0.57–0.98), and any type of dementia (RR=0.74, 95% CI=0.61–0.91), but not cognitive decline (RR=0.28, 95 % CI=0.03–2.83) [31]. These findings are consistent with a previous meta-analysis by Peters et al. [33] in which light to moderate alcohol consumption was associated with a decreased risk for dementia (RR=0.63, 95 % CI=0.53–0.75) and Alzheimer’s disease (RR=0.57, 95 % CI=0.44–0.74), but not vascular dementia (RR=0.82, 95% CI=0.50–1.35) or cognitive decline RR=0.89, 95% CI=0.67–1.17). […] Mild cognitive impairment (MCI) has been used to describe the prodromal stage of Alzheimer’s disease […]. There is no strong evidence to suggest that consuming alcohol is protective against MCI [39, 40] and several studies have reported non-significant findings [41–43].”

The majority of research on the relationship between alcohol consumption and cognitive outcomes has focused on the amount of alcohol consumed during old age, but there is a growing body of research that has examined the relationship between alcohol consumption during middle age and cognitive outcomes several years or decades later. The evidence from this area of research is mixed with some studies not detecting a significant relationship [17, 58, 59], while others have reported that light to moderate alcohol consumption is associated with preserved cognition [60] and decreased risk for cognitive impairment [31, 61, 62]. […] Several epidemiological studies have reported that light to moderate alcohol consumption is associated with a decreased risk for stroke, diabetes, and heart disease [36, 84, 85]. Similar to the U-shaped relationship between alcohol consumption and dementia, heavy alcohol consumption has been associated with poor health [86, 87]. The decreased risk for several metabolic and vascular health conditions for alcohol consumers has been attributed to antioxidants [54], greater concentrations of high-density lipoprotein cholesterol in the bloodstream [88], and reduced blood clot formation [89]. Stroke, diabetes, heart disease, and related conditions have all been associated with lower cognitive functioning during old age [90, 91]. The reduced prevalence of metabolic and vascular health conditions among light to moderate alcohol consumers may contribute to the decreased risk for dementia and cognitive decline for older adults who consume alcohol. A limitation of the hypothesis that the reduced risk for dementia among light and moderate alcohol consumers is conferred through the reduced prevalence of adverse health conditions associated with dementia is the possibility that this relationship is confounded by reverse causality. Alcohol consumption decreases with advancing age and adults may reduce their alcohol consumption in response to the onset of adverse health conditions […] the higher prevalence of dementia and lower cognitive functioning among abstainers may be due in part to their worse health rather than their alcohol consumption.”

A limitation of large cohort studies is that subjects who choose not to participate or are unable to participate are often less healthy than those who do participate. Non-response bias becomes more pronounced with age because only subjects who have survived to old age and are healthy enough to participate are observed. Studies on alcohol consumption and cognition are sensitive to non-response bias because light and moderate drinkers who are not healthy enough to participate in the study will not be observed. Adults who survive to old age despite consuming very high amounts of alcohol represent an even more select segment of the general population because they may have genetic, behavioral, health, social, or other factors that protect them against the negative effects of heavy alcohol consumption. As a result, the analytic sample of epidemiological studies is more likely to be comprised of “healthy” drinkers, which biases results in favor of finding a positive effect of light to moderate alcohol consumption for cognition and health in general. […] The incidence of Alzheimer’s disease doubles every 5 years after 65 years of age [94] and nearly 40% of older adults aged 85 and over are diagnosed with Alzheimer’s disease [7]. The relatively old age of onset for most dementia cases means the observed protective effect of light to moderate alcohol consumption for dementia may be due to alcohol consumers being more likely to die or drop out of a study as a result of their alcohol consumption before they develop dementia. This bias may be especially strong for heavy alcohol consumers. Not properly accounting for death as a competing outcome has been observed to artificially increase the risk of dementia among older adults with diabetes [95] and the effect that death and other competing outcomes may have on the relationship between alcohol consumption and dementia risk is unclear. […] The majority of epidemiological studies that have studied the relationship between alcohol consumption and cognition treat abstainers as the reference category. This can be problematic because often times the abstainer or non-drinking category includes older adults who stopped consuming alcohol because of poor health […] Not differentiating former alcohol consumers from lifelong abstainers has been found to explain some but not all of the benefit of alcohol consumption for preventing mortality from cardiovascular causes [96].”

“It is common for people to engage in other behaviors while consuming alcohol. This complicates the relationship between alcohol consumption and cognition because many of the behaviors associated with alcohol consumption are positively and negatively associated with cognitive functioning. For example, alcohol consumers are more likely to smoke than non-drinkers [104] and smoking has been associated with an increased risk for dementia and cognitive decline [105]. […] The relationship between alcohol consumption and cognition may also differ between people with or without a history of mental illness. Depression reduces the volume of the hippocampus [106] and there is growing evidence that depression plays an important role in dementia. Depression during middle age is recognized as a risk factor for dementia [107], and high depressive symptoms during old age may be an early symptom of dementia [108]. Middle aged adults with depression or other mental illness who self-medicate with alcohol may be at especially high risk for dementia later in life because of synergistic effects that alcohol and depression has on the brain. […] While current evidence from epidemiological studies indicates that consuming light to moderate amounts of alcohol, in particular wine, does not negatively affect cognition and in many cases is associated with cognitive health, adults who do not consume alcohol should not be encouraged to increase their alcohol consumption until further research clarifies these relationships. Inconsistencies between studies on how alcohol consumption categories are defined make it difficult to determine the “optimal” amount of alcohol consumption to prevent dementia. It is likely that the optimal amount of alcohol varies according to a person’s gender, as well as genetic, physiological, behavioral, and health characteristics, making the issue extremely complex.”

Falls are the leading cause of both fatal and nonfatal injuries among older adults, with one in three older adults falling each year, and 20–30% of people who fall suffer moderate to severe injuries such as lacerations, hip fractures, and head traumas. In fact, falls are the foremost cause of both fractures and traumatic brain injury (TBI) among older adults […] In 2013, 2.5 million nonfatal falls among older adults were treated in ED and more than 734,000 of these patients were hospitalized. […] Our analysis of the 2012 Nationwide Emergency Department Sample (NEDS) data set show that fall-related injury was a presenting problem among 12% of all ED visits by those aged 65+, with significant differences among age groups: 9% among the 65–74 age group, 12 % among the 75–84 age group, and 18 % among the 85+ age group [4]. […] heavy alcohol use predicts fractures. For example, among those 55+ years old in a health survey in England, men who consumed more than 8 units of alcohol and women who consumed more than 6 units on their heaviest drinking day in the past week had significantly increased odds of fractures (OR =1.65, 95% CI =1.37–1.98 for men and OR=2.07, 95% CI =1.28–3.35 for women) [63]. […] The 2008–2009 Canadian Community Health Survey-Healthy Aging also showed that consumption of at least one alcoholic drink per week increased the odds of falling by 40 % among those 65+ years [57].”

I at first was not much impressed by the effect sizes mentioned above because there are surely 100 relevant variables they didn’t account for/couldn’t account for, but then I thought a bit more about it. An important observation here – they don’t mention it in the coverage, but it sprang to mind – is that if sick or frail elderly people consume less alcohol than their more healthy counterparts, and are more likely to not consume alcohol (which they do, and which they are, we know this), and if frail or sick(er) elderly people are more likely to suffer a fall/fracture than are people who are relatively healthy (they are, again, we know this), well, then you’d expect consumption of alcohol to be found to have a ‘protective effect’ simply due to confounding by (reverse) indication (unless the researchers were really careful about adjusting for such things, but no such adjustments are mentioned in the coverage, which makes sense as these are just raw numbers being reported). The point is that the null here should not be that ‘these groups should be expected to have the same fall rate/fracture rate’, but rather ‘people who drink alcohol should be expected to be doing better, all else equal’ – but they aren’t, quite the reverse. So ‘the true effect size’ here may be larger than what you’d think.

I’m reasonably sure things are a lot more complicated than the above makes it appear (because of those 100 relevant variables we were talking about…), but I find it interesting anyway. Two more things to note: 1. Have another look at the numbers above if they didn’t sink in the first time. This is more than 10% of emergency department visits for that age group. Falls are a really big deal. 2. Fractures in the elderly are also a potentially really big deal. Here’s a sample quote: “One-fifth of hip fracture victims will die within 6 months of the injury, and only 50% will return to their previous level of independence.” (link). In some contexts, a fall is worse news than a cancer diagnosis, and they are very common events in the elderly. This also means that even relatively small effect sizes here can translate into quite large public health effects, because baseline incidence is so high.

The older adult population is a disproportionate consumer of prescription and over-the-counter medications. In a nationally representative sample of community-dwelling adults aged 57–84 years from the National Social Life, Health, and Aging Project (NSHAP) in 2005–2006, 81 % regularly used at least one prescription medication on a regular basis and 29% used at least five prescription medications. Forty-two percent used at least one nonprescription medication and concurrent use with a prescription medication was common, with 46% of prescription medication users also using OTC medications [2]. Prescription drug use by older adults in the U.S. is also growing. The percentage of older adults taking at least one prescription drug in the last 30 days increased from 73.6% in 1988–1994 to 89.7 % in 2007–2010 and the percentage taking five or more prescription drugs in the last 30 days increased from 13.8% in 1988–1994 to 39.7 % in 2007–2010 [3].”

The aging process can affect the response to a medication by altering its pharmacokinetics and pharmacodynamics [9, 10]. Reduced gastrointestinal motility and gastric acidity can alter the rate or extent of drug absorption. Changes in body composition, including decreased total body water and increased body fat can alter drug distribution. For alcohol, changes in body composition result in higher blood alcohol levels in older adults compared to younger adults after the same dose or quantity  of alcohol consumed. Decreased size of the liver, hepatic blood flow, and function of Phase I (oxidation, reduction, and hydrolysis) metabolic pathways result in reduced drug metabolism and increased drug exposure for drugs that undergo Phase I metabolism. Phase II hepatic metabolic pathways are generally preserved with aging. Decreased size of the kidney, renal blood flow, and glomerular filtration result in slower elimination of medications and metabolites by the kidney and increased drug exposure for medications that undergo renal elimination. Age-related impairment of homeostatic mechanisms and changes in receptor number and function can result in changes in pharmacodynamics as well. Older adults are generally more sensitive to the effects of medications and alcohol which act on the central nervous system for example. The consequences of these physiologic changes with aging are that older adults often experience increased drug exposure for the same dose (higher drug concentrations over time) and increased sensitivity to medications (greater response at a given drug concentration) than their younger counterparts.”

“Aging-related changes in physiology are not the only sources of variability in pharmacokinetics and pharmacodynamics that must be considered for an individual person. Older adults experience more chronic diseases that may decrease drug metabolism and renal elimination than younger cohorts. Frailty may result in further decline in drug metabolism, including Phase II metabolic pathways in the liver […] Drug interactions must also be considered […] A drug interaction is defined as a clinically meaningful change in the effect of one drug when coadministered with another drug [12]. Many drugs, including alcohol, have the potential for a drug interaction when administered concurrently, but whether a clinically meaningful change in effect occurs for a specific person depends on patient-specifc factors including age. Drug interactions are generally classified as pharmacokinetic interactions, where one drug alters the absorption, distribution, metabolism, or elimination of another drug resulting in increased or decreased drug exposure, or pharmacodynamic interactions, where one drug alters the response to another medication through additive or antagonistic pharmacologic effects [13]. An adverse drug event occurs when a pharmacokinetic or pharmacodynamic interaction or combination of both results in changes in drug exposure or response that lead to negative clinical outcomes. The adverse drug event could be a therapeutic failure if drug exposure is decreased or the pharmacologic response is antagonistic. The adverse drug event could be drug toxicity if the drug exposure is increased or the pharmacologic response is additive or synergistic. The threshold for experiencing an adverse event is often lower in older adults due to physiologic changes with aging and medical comorbidities, increasing their risk of experiencing an adverse drug event when medications are taken concurrently.”

“A large number of potential medication–alcohol interactions have been reported in the literature. Mechanisms of these interactions range from pharmacokinetic interactions affecting either alcohol or medication exposure to pharmacodynamics interactions resulting in exaggerated response. […] Epidemiologic evidence suggests that concurrent use of alcohol and medications among older adults is common. […] In a nationally representative U.S. sample of community-dwelling older adults in the National Social Life, Health and Aging Project (NSHAP) 2005–2006, 41% of participants reported consuming alcohol at least once per week and 20% were at risk for an alcohol–medication interaction because they were using both alcohol and alcohol-interacting medications on a regular basis [17]. […] Among participants in the Pennsylvania Assistance Contract for the Elderly program (aged 65–106 years) taking at least one prescription medication, 77% were taking an alcohol-interacting medication and 19% of the alcohol-interacting medication users reported concurrent use of alcohol [18]. […] Although these studies do not document adverse outcomes associated with alcohol–medication interactions, they do document that the potential exists for many older adults. […] High prevalence of concurrent use of alcohol and alcohol-interacting medications have also been reported in Australian men (43% of sedative or anxiolytic users were daily drinkers) [19], in older adults in Finland (42% of at-risk alcohol users were also taking alcohol-interacting medications) [20], and in older Irish adults (72% of participants were exposed to alcohol-interacting medications and 60% of these reported concurrent alcohol use) [21]. Drinking and medication use patterns in older adults may differ across countries, but alcohol–medication interactions appear to be a worldwide concern. […] Polypharmacy in general, and psychotropic burden specifically, has been associated with an increased risk of experiencing a geriatric syndrome such as falls or delirium, in older adults [26, 27]. Based on its pharmacology, alcohol can be considered as a psychotropic drug, and alcohol use should be assessed as part of the medication regimen evaluation to support efforts to prevent or manage geriatric syndromes. […] Combining alcohol and CNS active medications can be particularly problematic […] Older adults suffering from sleep problems or pain may be a particular risk for alcohol–medication interaction-related adverse events.”

In general, alcohol use in younger couples has been found to be highly concordant, that is, individuals in a relationship tend to engage in similar drinking behaviors [67,68]. Less is known, however, about alcohol use concordance between older couples. Graham and Braun [69] examined similarities in drinking behavior between spouses in a study of 826 community-dwelling older adults in Ontario, Canada. Results showed high concordance of drinking between spouses — whether they drank at all, how much they drank, and how frequently. […] Social learning theory suggests that alcohol use trajectories are strongly influenced by attitudes and behaviors of an individual’s social networks, particularly family and friends. When individuals engage in social activities with family and friends who approve of and engage in drinking, alcohol use, and misuse are reinforced [58, 59]. Evidence shows that among older adults, participation in social activities is correlated with higher levels of alcohol consumption [34, 60]. […] Brennan and Moos [29] […] found that older adults who reported less empathy and support from friends drank more alcohol, were more depressed, and were less self-confident. More stressors involving friends were associated with more drinking problems. Similar to the findings on marital conflict […], conflict in close friendships can prompt alcohol-use problems; conversely, these relationships can suffer as a result of alcohol-related problems. […] As opposed to social network theory […], social selection theory proposes that alcohol consumption changes an individual’s social context [33]. Studies among younger adults have shown that heavier drinkers chose partners and friends who approve of heavier drinking [70] and that excessive drinking can alienate social networks. The Moos study supports the idea that social selection also has a strong influence on drinking behavior among older adults.”

Traditionally, treatment studies in addiction have excluded patients over the age of 65. This bias has left a tremendous gap in knowledge regarding treatment outcomes and an understanding of the neurobiology of addiction in older adults.

Alcohol use causes well-established changes in sleep patterns, such as decreased sleep latency, decreased stage IV sleep, and precipitation or aggravation of sleep apnea [101]. There are also age-associated changes in sleep patterns including increased REM episodes, a decrease in REM length, a decrease in stage III and IV sleep, and increased awakenings. Age-associated changes in sleep can all be worsened by alcohol use and depression. Moeller and colleagues [102] demonstrated in younger subjects that alcohol and depression had additive effects upon sleep disturbances when they occurred together [102]. Wagman and colleagues [101] also have demonstrated that abstinent alcoholics did not sleep well because of insomnia, frequent awakenings, and REM fragmentation [101]; however, when these subjects ingested alcohol, sleep periodicity normalized and REM sleep was temporarily suppressed, suggesting that alcohol use could be used to self-medicate for sleep disturbances. A common anecdote from patients is that alcohol is used to help with sleep problems. […] The use of alcohol to self-medicate is considered maladaptive [34] and is associated with a host of negative outcomes. […] The use of alcohol to aid with sleep has been found to disrupt sleep architecture and cause sleep-related problems and daytime sleepiness [35, 36, 46]. Though alcohol is commonly used to aid with sleep initiation, it can worsen sleep-related breathing disorders and cause snoring and obstructive sleep apnea [36].”

Epidemiologic studies have clearly demonstrated that comorbidity between alcohol use and other psychiatric symptoms is common in younger age groups. Less is known about comorbidity between alcohol use and psychiatric illness in late life [88]. […] Blow et al. [90] reviewed the diagnosis of 3,986 VA patients between ages 60 and 69 presenting for alcohol treatment [90]. The most common comorbid psychiatric disorder was an affective disorder found in 21 % of the patients. […] Blazer et al. [91] studied 997 community dwelling elderly of whom only 4.5% had a history of alcohol use problems [91]; […] of these subjects, almost half had a comorbid diagnosis of depression or dysthymia. Comorbid depressive symptoms are not only common in late life but are also an important factor in the course and prognosis of psychiatric disorders. Depressed alcoholics have been shown to have a more complicated clinical course of depression with an increased risk of suicide and more social dysfunction than non-depressed alcoholics [9296]. […]  Alcohol use prior to late life has also been shown to influence treatment of late life depression. Cook and colleagues [94] found that a prior history of alcohol use problems predicted a more severe and chronic course for depression [94]. […] The effect of past heavy alcohol use is [also] highlighted in the findings from the Liverpool Longitudinal Study demonstrating a fivefold increase in psychiatric illness among elderly men who had a lifetime history of 5 or more years of heavy drinking [24]. The association between heavy alcohol consumption in earlier years and psychiatric morbidity in later life was not explained by current drinking habits. […] While Wernicke-Korsakoff’s syndrome is well described and often caused by alcohol use disorders, alcohol-related dementia may be difficult to differentiate from Alzheimer’s disease. Clinical diagnostic criteria for alcohol-related dementia (ARD) have been proposed and now validated in at least one trial, suggesting a method for distinguishing ARD, including Wernicke-Korsakoff’s syndrome, from other types of dementia [97, 98]. […] Finlayson et al. [100] found that 49 of 216 (23%) elderly patients presenting for alcohol treatment had dementia associated with alcohol use disorders [100].”

 

May 24, 2018 Posted by | Books, Demographics, Epidemiology, Medicine, Neurology, Pharmacology, Psychiatry, Statistics | Leave a comment

Trade-offs when doing medical testing

I was considering whether or not to blog the molecular biology text I recently read today, but I decided against it. However as I did feel like blogging today, I decided instead to add here a few comments I left on SCC. I rarely leave comments on other blogs, but it does happen, and the question I was ‘answering’ (partially – other guys had already added some pretty good comments by the time I joined the debate) is probably a question that I imagine a lot of e.g. undergrads are asking themselves, namely: “What’s the standard procedure, when designing a medical test, to determine the right tradeoff between sensitivity and specificity (where I’m picturing a tradeoff involved in choosing the threshold for a positive test or something similar)?

The ‘short version’, if you want an answer to this question, is probably to read Newman and Kohn’s wonderful book on these- and related- topics (which I blogged here), but that’s not actually a ‘short answer’ in terms of how people usually think about these things. I’ll just reproduce my own comment here, and mention that other guys had already covered some key topics by the time I joined ‘the fray’:

“Some good comments already. I don’t know to which extent the following points have been included in the links provided, but I decided to add them here anyway.

One point worth emphasizing is that you’ll always want a mixture of sensitivity and specificity (or, more broadly, test properties) that’ll mean that your test has clinical relevance. This relates both to the type of test you consider and when/whether to test at all (rather than treat/not treat without testing first). If you’re worried someone has disease X and there’s a high risk of said individual having disease X due to the clinical presentation, some tests will for example be inappropriate even if they are very good at making the distinction between individuals requiring treatment X and individuals not requiring treatment X, for example because they take time to perform that the patient might not have – not an uncommon situation in emergency medicine. If you’re so worried you’d treat him regardless of the test result, you shouldn’t test. And the same goes for e.g. low-sensitivity screens; if a positive test result of a screen does not imply that you’ll actually act on the result of the screen, you shouldn’t perform it (in screening contexts cost effectiveness is usually critically dependent on how you follow up on the test result, and in many contexts inadequate follow-up means that the value of the test goes down a lot […on a related note I have been thinking that I was perhaps not as kind as I could have been when I reviewed Juth & Munthe’s book and I have actually considered whether or not to change my rating of the book; it does give a decent introduction to some key trade-offs with which you’re confronted when you’re dealing with topics related to screening].

Cost effectiveness is another variable that would/should probably (in an ideal world?) enter the analysis when you’re judging what is or is not a good mixture of sensitivity and specificity – you should be willing to pay more for more precise tests, but only to the extent that those more precise tests lead to better outcomes (you’re usually optimizing over patient outcomes, not test accuracy).

Skef also mentions this, but the relative values of specificity and sensitivity may well vary during the diagnostic process; i.e. the (ideal) trade-off will depend on what you plan to use the test for. Is the idea behind testing this guy to make (reasonably?) sure he doesn’t have colon cancer, or to figure out if he needs a more accurate, but also more expensive, test? Screening setups will usually involve a multi-level testing structure, and tests at different levels will not treat these trade-offs the same way, nor should they. This also means that the properties of individual tests can not really be viewed in isolation, which makes the problem of finding ‘the ideal mix’ of test properties (whatever these might be) even harder; if you have three potential tests for example, it’s not enough to compare the tests individually against each other, you’d ideally also want to implicitly take into account that different combinations of tests have different properties, and that the timing of the test may also be an important parameter in the decision problem.”

On a related note I think that in general the idea of looking for some kind of ‘approved method’ that you can use to save yourself from thinking is a very dangerous approach when you’re doing applied statistics. If you’re not thinking about relevant trade-offs and how to deal with them, odds are you’re missing a big part of the picture. If somebody claims to have somehow discovered some simple approach to dealing with all of the relevant trade-offs, well, you should be very skeptical. Statistics usually don’t work like that.

May 4, 2018 Posted by | Medicine, Statistics | Leave a comment

Medical Statistics (III)

In this post I’ll include some links and quotes related to topics covered in chapters 4, 6, and 7 of the book. Before diving in, I’ll however draw attention to some of Gerd Gigerenzer’s work as it is quite relevant to in particular the coverage included in chapter 4 (‘Presenting research findings’), even if the authors seem unaware of this. One of Gigerenzer’s key insights, which I consider important and which I have thus tried to keep in mind, unfortunately goes unmentioned in the book; namely the idea that how you communicate risk might be very important in terms of whether or not people actually understand what you are trying to tell them. A related observation is that people have studied these things and they’ve figured out that some types of risk communication are demonstrably better than others at enabling people to understand the issues at hand and the trade-offs involved in a given situation. I covered some of these ideas in a comment on SCC some time ago; if those comments spark your interest you should definitely go read the book).

IMRAD format.
CONSORT Statement (randomized trials).
Equator Network.

“Abstracts may appear easy to write since they are very short […] and often required to be written in a structured format. It is therefore perhaps surprising that they are sometimes poorly written, too bland, contain inaccuracies, and/or are simply misleading.1  The reason for poor quality abstracts are complex; abstracts are often written at the end of a long process of data collection, analysis, and writing up, when time is short and researchers are weary. […] statistical issues […] can lead to an abstract that is not a fair representation of the research conducted. […] it is important that the abstract is consistent with the body of text and that it gives a balanced summary of the work. […] To maximize its usefulness, a summary or abstract should include estimates and confidence intervals for the main findings and not simply present P values.”

“The methods section should describe how the study was conducted. […] it is important to include the following: *The setting or area […] The date(s) […] subjects included […] study design […] measurements used […] source of any non-original data […] sample size, including a justification […] statistical methods, including any computer software used […] The discussion section is where the findings of the study are discussed and interpreted […] this section tends to include less statistics than the results section […] Some medical journals have a specific structure for the discussion for researchers to follow, and so it is important to check the journal’s guidelines before submitting. […] [When] reporting statistical analyses from statistical programs: *Don’t put unedited computer output into a research document. *Extract the relevant data only and reformat as needed […] Beware of presenting percentages for very small samples as they may be misleading. Simply give the numbers alone. […] In general the following is recommended for P values: *Give the actual P value whenever possible. *Rounding: Two significant figures are usually enough […] [Confidence intervals] should be given whenever possible to indicate the precision of estimates. […] Avoid graphs with missing zeros or stretched scales […] a table or graph should stand alone so that a reader does not need to read the […] article to be able to understand it.”

Statistical data type.
Level of measurement.
Descriptive statistics.
Summary statistics.
Geometric mean.
Harmonic mean.
Mode.
Interquartile range.
Histogram.
Stem and leaf plot.
Box and whisker plot.
Dot plot.

“Quantitative data are data that can be measured numerically and may be continuous or discrete. *Continuous data lie on a continuum and so can take any value between two limits. […] *Discrete data do not lie on a continuum and can only take certain values, usually counts (integers) […] On an interval scale, differences between values at different points of the scale have the same meaning […] Data can be regarded as on a ratio scale if the ratio of the two measurements has a meaning. For example we can say that twice as many people in one group had a particular characteristic compared with another group and this has a sensible meaning. […] Quantitative data are always ordinal – the data values can be arranged in a numerical order from the smallest to the largest. […] *Interval scale data are always ordinal. Ratio scale data are always interval scale data and therefore must also be ordinal. *In practice, continuous data may look discrete because of the way they are measured and/or reported. […] All continuous measurements are limited by the accuracy of the instrument used to measure them, and many quantities such as age and height are reported in whole numbers for convenience”.

“Categorical data are data where individuals fall into a number of separate categories or classes. […] Different categories of categorical data may be assigned a number for coding purposes […] and if there are several categories, there may be an implied ordering, such as with stage of cancer where stage I is the least advanced and stage IV is the most advanced. This means that such data are ordinal but not interval because the ‘distance’ between adjacent categories has no real measurement attached to it. The ‘gap’ between stages I and II disease is not necessarily the same as the ‘gap’ between stages III and IV. […] Where categorical data are coded with numerical codes, it might appear that there is an ordering but this may not necessarily be so. It is important to distinguish between ordered and non-ordered data because it affects the analysis.”

“It is usually useful to present more than one summary measure for a set of data […] If the data are going to be analyzed using methods based on means then it makes sense to present means rather than medians. If the data are skewed they may need to be transformed before analysis and so it is best to present summaries based on the transformed data, such as geometric means. […] For very skewed data rather than reporting the median, it may be helpful to present a different percentile (i.e. not the 50th), which better reflects the shape of the distribution. […] Some researchers are reluctant to present the standard deviation when the data are skewed and so present the median and range and/or quartiles. If analyses are planned which are based on means then it makes sense to be consistent and give standard deviations. Further, the useful relationship that approximately 95% of the data lie between mean +/- 2 standard deviations, holds even for skewed data […] If data are transformed, the standard deviation cannot be back-transformed correctly and so for transformed data a standard deviation cannot be given. In this case the untransformed standard deviation can be given or another measure of spread. […] For discrete data with a narrow range, such as stage of cancer, it may be better to present the actual frequency distribution to give a fair summary of the data, rather than calculate a mean or dichotomize it. […] It is often useful to tabulate one categorical variable against another to show the proportions or percentages of the categories of one variable by the other”.

Random variable.
Independence (probability theory).
Probability.
Probability distribution.
Binomial distribution.
Poisson distribution.
Continuous probability distribution.
Normal distribution.
Uniform distribution.

“The central limit theorem is a very important mathematical theorem that links the Normal distribution with other distributions in a unique and surprising way and is therefore very useful in statistics. *The sum of a large number of independent random variables will follow an approximately Normal distribution irrespective of their underlying distributions. *This means that any random variable which can be regarded as a the sum of a large number of small, independent contributions is likely to follow the Normal distribution. [I didn’t really like this description as it’s insufficiently detailed for my taste (and this was pretty much all they wrote about the CLT in that chapter); and one problem with the CLT is that people often think it applies when it might not actually do so, because the data restrictions implied by the theorem(s) are not really fully appreciated. On a related note people often seem to misunderstand what these theorems actually say and where they apply – see e.g. paragraph 10 in this post. See also the wiki link above for a more comprehensive treatment of these topicsUS] *The Normal distribution can be used as an approximation to the Binomial distribution when n is large […] The Normal distribution can be used as an approximation to the Poisson distribution as the mean of the Poisson distribution increases […] The main advantage in using the Normal rather than the Binomial or the Poisson distribution is that it makes it easier to calculate probabilities and confidence intervals”

“The t distribution plays an important role in statistics as the sampling distribution of the sample mean divided by its standard error and is used in significance testing […] The shape is symmetrical about the mean value, and is similar to the Normal distribution but with a higher peak and longer tails to take account of the reduced precision in smaller samples. The exact shape is determined by the mean and variance plus the degrees of freedom. As the degrees of freedom increase, the shape comes closer to the Normal distribution […] The chi-squared distribution also plays an important role in statistics. If we take several variables, say n, which each follow a standard Normal distribution, and square each and add them, the sum of these will follow a chi-squared distribution with n degrees of freedom. This theoretical result is very useful and widely used in statistical testing […] The chi-squared distribution is always positive and its shape is uniquely determined by the degrees of freedom. The distribution becomes more symmetrical as the degrees of freedom increases. […] [The (noncentral) F distribution] is the distribution of the ratio of two chi-squared distributions and is used in hypothesis testing when we want to compare variances, such as in doing analysis of variance […] Sometimes data may follow a positively skewed distribution which becomes a Normal distribution when each data point is log-transformed [..] In this case the original data can be said to follow a lognormal distribution. The transformation of such data from log-normal to Normal is very useful in allowing skewed data to be analysed using methods based on the Normal distribution since these are usually more powerful than alternative methods”.

Half-Normal distribution.
Bivariate Normal distribution.
Negative binomial distribution.
Beta distribution.
Gamma distribution.
Conditional probability.
Bayes theorem.

April 26, 2018 Posted by | Books, Data, Mathematics, Medicine, Statistics | Leave a comment

Medical Statistics (II)

In this post I’ll include some links and quotes related to topics covered in chapters 2 and 3 of the book. Chapter 2 is about ‘Collecting data’ and chapter 3 is about ‘Handling data: what steps are important?’

“Data collection is a key part of the research process, and the collection method will impact on later statistical analysis of the data. […] Think about the anticipated data analysis [in advance] so that data are collected in the appropriate format, e.g. if a mean will be needed for the analysis, then don’t record the data in categories, record the actual value. […] *It is useful to pilot the data collection process in a range of circumstances to make sure it will work in practice. *This usually involves trialling the data collection form on a smaller sample than intended for the study and enables problems with the data collection form to be identified and resolved prior to main data collection […] In general don’t expect the person filling out the form to do calculations as this may lead to errors, e.g. calculating a length of time between two dates. Instead, record each piece of information to allow computation of the particular value later […] The coding scheme should be designed at the same time as the form so that it can be built into the form. […] It may be important to distinguish between data that are simply missing from the original source and data that the data extractor failed to record. This can be achieved using different codes […] The use of numerical codes for non-numerical data may give the false impression that these data can be treated as if they were numerical data in the statistical analysis. This is not so.”

“It is critical that data quality is monitored and that this happens as the study progresses. It may be too late if problems are only discovered at the analysis stage. If checks are made during the data collection then problems can be corrected. More frequent checks may be worthwhile at the beginning of data collection when processes may be new and staff may be less experienced. […] The layout […] affects questionnaire completion rates and therefore impacts on the overall quality of the data collected.”

“Sometimes researchers need to develop a new measurement or questionnaire scale […] To do this rigorously requires a thorough process. We will outline the main steps here and note the most common statistical measures used in the process. […] Face validity *Is the scale measuring what it sets out to measure? […] Content validity *Does the scale cover all the relevant areas? […] *Between-observers consistency: is there agreement between different observers assessing the same individuals? *Within-observers consistency: is there agreement between assessments on the same individuals by the same observer on two different occasions? *Test-retest consistency: are assessments made on two separate occasions on the same individual similar? […] If a scale has several questions or items which all address the same issue then we usually expect each individual to get similar scores for those questions, i.e. we expect their responses to be internally consistent. […] Cronbach’s alpha […] is often used to assess the degree of internal consistency. [It] is calculated as an average of all correlations among the different questions on the scale. […] *Values are usually expected to be above 0.7 and below 0.9 *Alpha below 0.7 broadly indicates poor internal consistency *Alpha above 0.9 suggests that the items are very similar and perhaps fewer items could be used to obtain the same overall information”.

Bland–Altman plot.
Coefficient of variation.
Intraclass correlation.
Cohen’s kappa.
Likert scale. (“The key characteristic of Likert scales is that the scale is symmetrical. […] Care is needed when analyzing Likert scale data even though a numerical code is assigned to the responses, since the data are ordinal and discrete. Hence an average may be misleading […] It is quite common to collapse Likert scales into two or three categories such as agree versus disagree, but this has the disadvantage that data are discarded.”)
Visual analogue scale. (“VAS scores can be treated like continuous data […] Where it is feasible to use a VAS, it is preferable as it provides greater statistical power than a categorical scale”)

“Correct handling of data is essential to produce valid and reliable statistics. […] Data from research studies need to be coded […] It is important to document the coding scheme for categorical variables such as sex where it will not be obviously [sic, US] what the values mean […] It is strongly recommended that a unique numerical identifier is given to each subject, even if the research is conducted anonymously. […] Computerized datasets are often stored in a spreadsheet format with rows and columns of data. For most statistical analyses it is best to enter the data so that each row represents a different subject and each column a different variable. […] Prefixes or suffixes can be used to denote […] repeated measurements. If there are several repeated variables, use the same ‘scheme’ for all to avoid confusion. […] Try to avoid mixing suffixes and prefixes as it can cause confusion.”

“When data are entered onto a computer at different times it may be necessary to join datasets together. […] It is important to avoid over-writing a current dataset with a new updated version without keeping the old version as a separate file […] the two datasets must use exactly the same variable names for the same variables and the same coding. Any spelling mistakes will prevent a successful joining. […] It is worth checking that the joining has worked as expected by checking that the total number of observations in the updated file is the sum of the two previous files, and that the total number of variables is unchanged. […] When new data are collected on the same individuals at a later stage […], it may [again] be necessary to merge datasets. In order to do this the unique subject identifier must be used to identify the records that must be matched. For the merge to work, all variable names in the two datasets must be different except for the unique identifier. […] Spreadsheets are useful for entering and storing data. However, care should be taken when cutting and pasting different datasets to avoid misalignment of data. […] it is best not to join or sort datasets using a spreadsheet […in some research contexts, I’d add, this is also just plain impossible to even try, due to the amount of data involved – US…] […] It is important to ensure that a unique copy of the current file, the ‘master copy’, is stored at all times. Where the study involves more than one investigator, everyone needs to know who has responsibility for this. It is also important to avoid having two people revising the same file at the same time. […] It is important to keep a record of any changes that are made to the dataset and keep dated copies of datasets as changes are made […] Don’t overwrite datasets with edited versions as older versions may be needed later on.”

“Where possible, it is important to do some [data entry] checks early on to leave time for addressing problems while the study is in progress. […] *Check a random sample of forms for data entry accuracy. If this reveals problems then further checking may be needed. […] If feasible, consider checking data entry forms for key variables, e.g. the primary outcome. […] Range checks: […] tabulate all data to ensure there are no invalid values […] make sure responses are consistent with each other within subjects, e.g. check for any impossible or unlikely combination of responses such as a male with a pregnancy […] Check where feasible that any gaps are true gaps and not missed data entry […] Sometimes finding one error may lead to others being uncovered. For example, if a spreadsheet was used for data entry and one entry was missed, all following entries may be in the wrong columns. Hence, always consider if the discovery of one error may imply that there are others. […] Plots can be useful for checking larger datasets.”

Data monitoring committee.
Damocles guidelines.
Overview of stopping rules for clinical trials.
Pocock boundary.
Haybittle–Peto boundary.

“Trials are only stopped early when it is considered that the evidence for either benefit or harm is overwhelmingly strong. In such cases, the effect size will inevitably be larger than anticipated at the outset of the trial in order to trigger the early stop. Hence effect estimates from trials stopped early tend to be more extreme than would be the case if these trials had continued to the end, and so estimates of the efficacy or harm of a particular treatment may be exaggerated. This phenomenon has been demonstrated in recent reviews.1,2 […] Sometimes it becomes apparent part way through a trial that the assumptions made in the original sample size calculations are not correct. For example, where the primary outcome is a continuous variable, an estimate of the standard deviation (SD) is needed to calculate the required sample size. When the data are summarized during the trial, it may become apparent that the observed SD is different from that expected. This has implications for the statistical power. If the observed SD is smaller than expected then it may be reasonable to reduce the sample size but if it is bigger then it may be necessary to increase it.”

April 16, 2018 Posted by | Books, Medicine, Statistics | Leave a comment

Medical Statistics (I)

I was more than a little critical of the book in my review on goodreads, and the review is sufficiently detailed that I thought it would be worth including it in this post. Here’s what I wrote on goodreads (slightly edited to take full advantage of the better editing options on wordpress):

“The coverage is excessively focused on significance testing. The book also provides very poor coverage of model selection topics, where the authors not once but repeatedly recommend employing statistically invalid approaches to model selection (the authors recommend using hypothesis testing mechanisms to guide model selection, as well as using adjusted R-squared for model selection decisions – both of which are frankly awful ideas, for reasons which are obvious to people familiar with the field of model selection. “Generally, hypothesis testing is a very poor basis for model selection […] There is no statistical theory that supports the notion that hypothesis testing with a fixed α level is a basis for model selection.” “While adjusted R2 is useful as a descriptive statistic, it is not useful in model selection” – quotes taken directly from Burnham & Anderson’s book Model Selection and Multi-Model Inference: A Practical Information-Theoretic Approach).

The authors do not at any point in the coverage even mention the option of using statistical information criteria to guide model selection decisions, and frankly repeatedly recommend doing things which are known to be deeply problematic. The authors also cover material from Borenstein and Hedges’ meta-analysis text in the book, yet still somehow manage to give poor advice in the context of meta-analysis along similar lines (implicitly advising people to base model decisions within the context of whether to use fixed effects or random effects on the results of heterogeneity tests, despite this approach being criticized as problematic in the formerly mentioned text).

Basic and not terrible, but there are quite a few problems with this text.”

I’ll add a few more details about the above-mentioned problems before moving on to the main coverage. As for the model selection topic I refer specifically to my coverage of Burnham and Anderson’s book here and here – these guys spent a lot of pages talking about why you shouldn’t do what the authors of this book recommend, and I’m sort of flabbergasted medical statisticians don’t know this kind of stuff by now. To people who’ve read both these books, it’s not really in question who’s in the right here.

I believe part of the reason why I was very annoyed at the authors at times was that they seem to promote exactly a sort of blind unthinking hypothesis-testing approach to things that is unfortunately very common – the entire book is saturated with hypothesis testing stuff, which means that many other topics are woefully insufficiently covered. The meta-analysis example is probably quite illustrative; the authors spend multiple pages on study heterogeneity and how to deal with it, but the entire coverage there is centered around the discussion of a most-likely underpowered test, the result of which should perhaps in the best case scenario direct the researcher’s attention to topics he should be have been thinking carefully about from the very start of his data analysis. You don’t need to quote many words from Borenstein and Hedges (here’s a relevant link) to get to the heart of the matter here:

“It makes sense to use the fixed-effect model if two conditions are met. First, we believe that all the studies included in the analysis are functionally identical. Second, our goal is to compute the common effect size for the identified population, and not to generalize to other populations. […] this situation is relatively rare. […] By contrast, when the researcher is accumulating data from a series of studies that had been performed by researchers operating independently, it would be unlikely that all the studies were functionally equivalent. Typically, the subjects or interventions in these studies would have differed in ways that would have impacted on the results, and therefore we should not assume a common effect size. Therefore, in these cases the random-effects model is more easily justified than the fixed-effect model.

A report should state the computational model used in the analysis and explain why this model was selected. A common mistake is to use the fixed-effect model on the basis that there is no evidence of heterogeneity. As [already] explained […], the decision to use one model or the other should depend on the nature of the studies, and not on the significance of this test [because the test will often have low power anyway].”

Yet these guys spend their efforts here talking about a test that is unlikely to yield useful information and which if anything probably distracts the reader from the main issues at hand; are the studies functionally equivalent? Do we assume there’s one (‘true’) effect size, or many? What do those coefficients we’re calculating actually mean? The authors do in fact include a lot of cautionary notes about how to interpret the test, but in my view all this means is that they’re devoting critical pages to peripheral issues – and perhaps even reinforcing the view that the test is important, or why else would they spend so much effort on it? – rather than promote good thinking about the key topics at hand.

Anyway, enough of the critical comments. Below a few links related to the first chapter of the book, as well as some quotes.

Declaration of Helsinki.
Randomized controlled trial.
Minimization (clinical trials).
Blocking (statistics).
Informed consent.
Blinding (RCTs). (…related xkcd link).
Parallel study. Crossover trial.
Zelen’s design.
Superiority, equivalence, and non-inferiority trials.
Intention-to-treat concept: A review.
Case-control study. Cohort study. Nested case-control study. Cross-sectional study.
Bradford Hill criteria.
Research protocol.
Sampling.
Type 1 and type 2 errors.
Clinical audit. A few quotes on this topic:

“‘Clinical audit’ is a quality improvement process that seeks to improve the patient care and outcomes through systematic review of care against explicit criteria and the implementation of change. Aspects of the structures, processes and outcomes of care are selected and systematically evaluated against explicit criteria. […] The aim of audit is to monitor clinical practice against agreed best practice standards and to remedy problems. […] the choice of topic is guided by indications of areas where improvement is needed […] Possible topics [include] *Areas where a problem has been identified […] *High volume practice […] *High risk practice […] *High cost […] *Areas of clinical practice where guidelines or firm evidence exists […] The organization carrying out the audit should have the ability to make changes based on their findings. […] In general, the same methods of statistical analysis are used for audit as for research […] The main difference between audit and research is in the aim of the study. A clinical research study aims to determine what practice is best, whereas an audit checks to see that best practice is being followed.”

A few more quotes from the end of the chapter:

“In clinical medicine and in medical research it is fairly common to categorize a biological measure into two groups, either to aid diagnosis or to classify an outcome. […] It is often useful to categorize a measurement in this way to guide decision-making, and/or to summarize the data but doing this leads to a loss of information which in turn has statistical consequences. […] If a continuous variable is used for analysis in a research study, a substantially smaller sample size will be needed than if the same variable is categorized into two groups […] *Categorization of a continuous variable into two groups loses much data and should be avoided whenever possible *Categorization of a continuous variable into several groups is less problematic”

“Research studies require certain specific data which must be collected to fulfil the aims of the study, such as the primary and secondary outcomes and main factors related to them. Beyond these data there are often other data that could be collected and it is important to weigh the costs and consequences of not collecting data that will be needed later against the disadvantages of collecting too much data. […] collecting too much data is likely to add to the time and cost to data collection and processing, and may threaten the completeness and/or quality of all of the data so that key data items are threatened. For example if a questionnaire is overly long, respondents may leave some questions out or may refuse to fill it out at all.”

Stratified samples are used when fixed numbers are needed from particular sections or strata of the population in order to achieve balance across certain important factors. For example a study designed to estimate the prevalence of diabetes in different ethnic groups may choose a random sample with equal numbers of subjects in each ethnic group to provide a set of estimates with equal precision for each group. If a simple random sample is used rather than a stratified sample, then estimates for minority ethnic groups may be based on small numbers and have poor precision. […] Cluster samples may be chosen where individuals fall naturally into groups or clusters. For example, patients on a hospital wards or patients in a GP practice. If a sample is needed of these patients, it may be easier to list the clusters and then to choose a random sample of clusters, rather than to choose a random sample of the whole population. […] Cluster sampling is less efficient statistically than simple random sampling […] the ICC summarizes the extent of the ‘clustering effect’. When individuals in the same cluster are much more alike than individuals in different clusters with respect to an outcome, then the clustering effect is greater and the impact on the required sample size is correspondingly greater. In practice there can be a substantial effect on the sample size even when the ICC is quite small. […] As well as considering how representative a sample is, it is important […] to consider the size of the sample. A sample may be unbiased and therefore representative, but too small to give reliable estimates. […] Prevalence estimates from small samples will be imprecise and therefore may be misleading. […] The greater the variability of a measure, the greater the number of subjects needed in the sample to estimate it precisely. […] the power of a study is the ability of the study to detect a difference if one exists.”

April 9, 2018 Posted by | Books, Epidemiology, Medicine, Statistics | Leave a comment

Networks

I actually think this was a really nice book, considering the format – I gave it four stars on goodreads. One of the things I noticed people didn’t like about it in the reviews is that it ‘jumps’ a bit in terms of topic coverage; it covers a wide variety of applications and analytical settings. I mostly don’t consider this a weakness of the book – even if occasionally it does get a bit excessive – and I can definitely understand the authors’ choice of approach; it’s sort of hard to illustrate the potential the analytical techniques described within this book have if you’re not allowed to talk about all the areas in which they have been – or could be gainfully – applied. A related point is that many people who read the book might be familiar with the application of these tools in specific contexts but have perhaps not thought about the fact that similar methods are applied in many other areas (and they might all of them be a bit annoyed the authors don’t talk more about computer science applications, or foodweb analyses, or infectious disease applications, or perhaps sociometry…). Most of the book is about graph-theory-related stuff, but a very decent amount of the coverage deals with applications, in a broad sense of the word at least, not theory. The discussion of theoretical constructs in the book always felt to me driven to a large degree by their usefulness in specific contexts.

I have covered related topics before here on the blog, also quite recently – e.g. there’s at least some overlap between this book and Holland’s book about complexity theory in the same series (I incidentally think these books probably go well together) – and as I found the book slightly difficult to blog as it was I decided against covering it in as much detail as I sometimes do when covering these texts – this means that I decided to leave out the links I usually include in posts like these.

Below some quotes from the book.

“The network approach focuses all the attention on the global structure of the interactions within a system. The detailed properties of each element on its own are simply ignored. Consequently, systems as different as a computer network, an ecosystem, or a social group are all described by the same tool: a graph, that is, a bare architecture of nodes bounded by connections. […] Representing widely different systems with the same tool can only be done by a high level of abstraction. What is lost in the specific description of the details is gained in the form of universality – that is, thinking about very different systems as if they were different realizations of the same theoretical structure. […] This line of reasoning provides many insights. […] The network approach also sheds light on another important feature: the fact that certain systems that grow without external control are still capable of spontaneously developing an internal order. […] Network models are able to describe in a clear and natural way how self-organization arises in many systems. […] In the study of complex, emergent, and self-organized systems (the modern science of complexity), networks are becoming increasingly important as a universal mathematical framework, especially when massive amounts of data are involved. […] networks are crucial instruments to sort out and organize these data, connecting individuals, products, news, etc. to each other. […] While the network approach eliminates many of the individual features of the phenomenon considered, it still maintains some of its specific features. Namely, it does not alter the size of the system — i.e. the number of its elements — or the pattern of interaction — i.e. the specific set of connections between elements. Such a simplified model is nevertheless enough to capture the properties of the system. […] The network approach [lies] somewhere between the description by individual elements and the description by big groups, bridging the two of them. In a certain sense, networks try to explain how a set of isolated elements are transformed, through a pattern of interactions, into groups and communities.”

“[T]he random graph model is very important because it quantifies the properties of a totally random network. Random graphs can be used as a benchmark, or null case, for any real network. This means that a random graph can be used in comparison to a real-world network, to understand how much chance has shaped the latter, and to what extent other criteria have played a role. The simplest recipe for building a random graph is the following. We take all the possible pair of vertices. For each pair, we toss a coin: if the result is heads, we draw a link; otherwise we pass to the next pair, until all the pairs are finished (this means drawing the link with a probability p = ½, but we may use whatever value of p). […] Nowadays [the random graph model] is a benchmark of comparison for all networks, since any deviations from this model suggests the presence of some kind of structure, order, regularity, and non-randomness in many real-world networks.”

“…in networks, topology is more important than metrics. […] In the network representation, the connections between the elements of a system are much more important than their specific positions in space and their relative distances. The focus on topology is one of its biggest strengths of the network approach, useful whenever topology is more relevant than metrics. […] In social networks, the relevance of topology means that social structure matters. […] Sociology has classified a broad range of possible links between individuals […]. The tendency to have several kinds of relationships in social networks is called multiplexity. But this phenomenon appears in many other networks: for example, two species can be connected by different strategies of predation, two computers by different cables or wireless connections, etc. We can modify a basic graph to take into account this multiplexity, e.g. by attaching specific tags to edges. […] Graph theory [also] allows us to encode in edges more complicated relationships, as when connections are not reciprocal. […] If a direction is attached to the edges, the resulting structure is a directed graph […] In these networks we have both in-degree and out-degree, measuring the number of inbound and outbound links of a node, respectively. […] in most cases, relations display a broad variation or intensity [i.e. they are not binary/dichotomous]. […] Weighted networks may arise, for example, as a result of different frequencies of interactions between individuals or entities.”

“An organism is […] the outcome of several layered networks and not only the deterministic result of the simple sequence of genes. Genomics has been joined by epigenomics, transcriptomics, proteomics, metabolomics, etc., the disciplines that study these layers, in what is commonly called the omics revolution. Networks are at the heart of this revolution. […] The brain is full of networks where various web-like structures provide the integration between specialized areas. In the cerebellum, neurons form modules that are repeated again and again: the interaction between modules is restricted to neighbours, similarly to what happens in a lattice. In other areas of the brain, we find random connections, with a more or less equal probability of connecting local, intermediate, or distant neurons. Finally, the neocortex — the region involved in many of the higher functions of mammals — combines local structures with more random, long-range connections. […] typically, food chains are not isolated, but interwoven in intricate patterns, where a species belongs to several chains at the same time. For example, a specialized species may predate on only one prey […]. If the prey becomes extinct, the population of the specialized species collapses, giving rise to a set of co-extinctions. An even more complicated case is where an omnivore species predates a certain herbivore, and both eat a certain plant. A decrease in the omnivore’s population does not imply that the plant thrives, because the herbivore would benefit from the decrease and consume even more plants. As more species are taken into account, the population dynamics can become more and more complicated. This is why a more appropriate description than ‘foodchains’ for ecosystems is the term foodwebs […]. These are networks in which nodes are species and links represent relations of predation. Links are usually directed (big fishes eat smaller ones, not the other way round). These networks provide the interchange of food, energy, and matter between species, and thus constitute the circulatory system of the biosphere.”

“In the cell, some groups of chemicals interact only with each other and with nothing else. In ecosystems, certain groups of species establish small foodwebs, without any connection to external species. In social systems, certain human groups may be totally separated from others. However, such disconnected groups, or components, are a strikingly small minority. In all networks, almost all the elements of the systems take part in one large connected structure, called a giant connected component. […] In general, the giant connected component includes not less than 90 to 95 per cent of the system in almost all networks. […] In a directed network, the existence of a path from one node to another does not guarantee that the journey can be made in the opposite direction. Wolves eat sheep, and sheep eat grass, but grass does not eat sheep, nor do sheep eat wolves. This restriction creates a complicated architecture within the giant connected component […] according to an estimate made in 1999, more than 90 per cent of the WWW is composed of pages connected to each other, if the direction of edges is ignored. However, if we take direction into account, the proportion of nodes mutually reachable is only 24 per cent, the giant strongly connected component. […] most networks are sparse, i.e. they tend to be quite frugal in connections. Take, for example, the airport network: the personal experience of every frequent traveller shows that direct flights are not that common, and intermediate stops are necessary to reach several destinations; thousands of airports are active, but each city is connected to less than 20 other cities, on average. The same happens in most networks. A measure of this is given by the mean number of connection of their nodes, that is, their average degree.”

“[A] puzzling contradiction — a sparse network can still be very well connected — […] attracted the attention of the Hungarian mathematicians […] Paul Erdős and Alfréd Rényi. They tackled it by producing different realizations of their random graph. In each of them, they changed the density of edges. They started with a very low density: less than one edge per node. It is natural to expect that, as the density increases, more and more nodes will be connected to each other. But what Erdős and Rényi found instead was a quite abrupt transition: several disconnected components coalesced suddenly into a large one, encompassing almost all the nodes. The sudden change happened at one specific critical density: when the average number of links per node (i.e. the average degree) was greater than one, then the giant connected component suddenly appeared. This result implies that networks display a very special kind of economy, intrinsic to their disordered structure: a small number of edges, even randomly distributed between nodes, is enough to generate a large structure that absorbs almost all the elements. […] Social systems seem to be very tightly connected: in a large enough group of strangers, it is not unlikely to find pairs of people with quite short chains of relations connecting them. […] The small-world property consists of the fact that the average distance between any two nodes (measured as the shortest path that connects them) is very small. Given a node in a network […], few nodes are very close to it […] and few are far from it […]: the majority are at the average — and very short — distance. This holds for all networks: starting from one specific node, almost all the nodes are at very few steps from it; the number of nodes within a certain distance increases exponentially fast with the distance. Another way of explaining the same phenomenon […] is the following: even if we add many nodes to a network, the average distance will not increase much; one has to increase the size of a network by several orders of magnitude to notice that the paths to new nodes are (just a little) longer. The small-world property is crucial to many network phenomena. […] The small-world property is something intrinsic to networks. Even the completely random Erdős-Renyi graphs show this feature. By contrast, regular grids do not display it. If the Internet was a chessboard-like lattice, the average distance between two routers would be of the order of 1,000 jumps, and the Net would be much slower [the authors note elsewhere that “The Internet is composed of hundreds of thousands of routers, but just about ten ‘jumps’ are enough to bring an information packet from one of them to any other.”] […] The key ingredient that transforms a structure of connections into a small world is the presence of a little disorder. No real network is an ordered array of elements. On the contrary, there are always connections ‘out of place’. It is precisely thanks to these connections that networks are small worlds. […] Shortcuts are responsible for the small-world property in many […] situations.”

“Body size, IQ, road speed, and other magnitudes have a characteristic scale: that is, an average value that in the large majority of cases is a rough predictor of the actual value that one will find. […] While height is a homogeneous magnitude, the number of social connection[s] is a heterogeneous one. […] A system with this feature is said to be scale-free or scale-invariant, in the sense that it does not have a characteristic scale. This can be rephrased by saying that the individual fluctuations with respect to the average are too large for us to make a correct prediction. […] In general, a network with heterogeneous connectivity has a set of clear hubs. When a graph is small, it is easy to find whether its connectivity is homogeneous or heterogeneous […]. In the first case, all the nodes have more or less the same connectivity, while in the latter it is easy to spot a few hubs. But when the network to be studied is very big […] things are not so easy. […] the distribution of the connectivity of the nodes of the […] network […] is the degree distribution of the graph. […] In homogeneous networks, the degree distribution is a bell curve […] while in heterogeneous networks, it is a power law […]. The power law implies that there are many more hubs (and much more connected) in heterogeneous networks than in homogeneous ones. Moreover, hubs are not isolated exceptions: there is a full hierarchy of nodes, each of them being a hub compared with the less connected ones.”

“Looking at the degree distribution is the best way to check if a network is heterogeneous or not: if the distribution is fat tailed, then the network will have hubs and heterogeneity. A mathematically perfect power law is never found, because this would imply the existence of hubs with an infinite number of connections. […] Nonetheless, a strongly skewed, fat-tailed distribution is a clear signal of heterogeneity, even if it is never a perfect power law. […] While the small-world property is something intrinsic to networked structures, hubs are not present in all kind of networks. For example, power grids usually have very few of them. […] hubs are not present in random networks. A consequence of this is that, while random networks are small worlds, heterogeneous ones are ultra-small worlds. That is, the distance between their vertices is relatively smaller than in their random counterparts. […] Heterogeneity is not equivalent to randomness. On the contrary, it can be the signature of a hidden order, not imposed by a top-down project, but generated by the elements of the system. The presence of this feature in widely different networks suggests that some common underlying mechanism may be at work in many of them. […] the Barabási–Albert model gives an important take-home message. A simple, local behaviour, iterated through many interactions, can give rise to complex structures. This arises without any overall blueprint”.

Homogamy, the tendency of like to marry like, is very strong […] Homogamy is a specific instance of homophily: this consists of a general trend of like to link to like, and is a powerful force in shaping social networks […] assortative mixing [is] a special form of homophily, in which nodes tend to connect with others that are similar to them in the number of connections. By contrast [when] high- and low-degree nodes are more connected to each other [it] is called disassortative mixing. Both cases display a form of correlation in the degrees of neighbouring nodes. When the degrees of neighbours are positively correlated, then the mixing is assortative; when negatively, it is disassortative. […] In random graphs, the neighbours of a given node are chosen completely at random: as a result, there is no clear correlation between the degrees of neighbouring nodes […]. On the contrary, correlations are present in most real-world networks. Although there is no general rule, most natural and technological networks tend to be disassortative, while social networks tend to be assortative. […] Degree assortativity and disassortativity are just an example of the broad range of possible correlations that bias how nodes tie to each other.”

“[N]etworks (neither ordered lattices nor random graphs), can have both large clustering and small average distance at the same time. […] in almost all networks, the clustering of a node depends on the degree of that node. Often, the larger the degree, the smaller the clustering coefficient. Small-degree nodes tend to belong to well-interconnected local communities. Similarly, hubs connect with many nodes that are not directly interconnected. […] Central nodes usually act as bridges or bottlenecks […]. For this reason, centrality is an estimate of the load handled by a node of a network, assuming that most of the traffic passes through the shortest paths (this is not always the case, but it is a good approximation). For the same reason, damaging central nodes […] can impair radically the flow of a network. Depending on the process one wants to study, other definitions of centrality can be introduced. For example, closeness centrality computes the distance of a node to all others, and reach centrality factors in the portion of all nodes that can be reached in one step, two steps, three steps, and so on.”

“Domino effects are not uncommon in foodwebs. Networks in general provide the backdrop for large-scale, sudden, and surprising dynamics. […] most of the real-world networks show a doubled-edged kind of robustness. They are able to function normally even when a large fraction of the network is damaged, but suddenly certain small failures, or targeted attacks, bring them down completely. […] networks are very different from engineered systems. In an airplane, damaging one element is enough to stop the whole machine. In order to make it more resilient, we have to use strategies such as duplicating certain pieces of the plane: this makes it almost 100 per cent safe. In contrast, networks, which are mostly not blueprinted, display a natural resilience to a broad range of errors, but when certain elements fail, they collapse. […] A random graph of the size of most real-world networks is destroyed after the removal of half of the nodes. On the other hand, when the same procedure is performed on a heterogeneous network (either a map of a real network or a scale-free model of a similar size), the giant connected component resists even after removing more than 80 per cent of the nodes, and the distance within it is practically the same as at the beginning. The scene is different when researchers simulate a targeted attack […] In this situation the collapse happens much faster […]. However, now the most vulnerable is the second: while in the homogeneous network it is necessary to remove about one-fifth of its more connected nodes to destroy it, in the heterogeneous one this happens after removing the first few hubs. Highly connected nodes seem to play a crucial role, in both errors and attacks. […] hubs are mainly responsible for the overall cohesion of the graph, and removing a few of them is enough to destroy it.”

“Studies of errors and attacks have shown that hubs keep different parts of a network connected. This implies that they also act as bridges for spreading diseases. Their numerous ties put them in contact with both infected and healthy individuals: so hubs become easily infected, and they infect other nodes easily. […] The vulnerability of heterogeneous networks to epidemics is bad news, but understanding it can provide good ideas for containing diseases. […] if we can immunize just a fraction, it is not a good idea to choose people at random. Most of the times, choosing at random implies selecting individuals with a relatively low number of connections. Even if they block the disease from spreading in their surroundings, hubs will always be there to put it back into circulation. A much better strategy would be to target hubs. Immunizing hubs is like deleting them from the network, and the studies on targeted attacks show that eliminating a small fraction of hubs fragments the network: thus, the disease will be confined to a few isolated components. […] in the epidemic spread of sexually transmitted diseases the timing of the links is crucial. Establishing an unprotected link with a person before they establish an unprotected link with another person who is infected is not the same as doing so afterwards.”

April 3, 2018 Posted by | Biology, Books, Ecology, Engineering, Epidemiology, Genetics, Mathematics, Statistics | Leave a comment

Safety-Critical Systems

Some related links to topics covered in the lecture:

Safety-critical system.
Safety engineering.
Fault tree analysis.
Failure mode and effects analysis.
Fail-safe.
Value of a statistical life.
ALARP principle.
Hazards and Risk (HSA).
Software system safety.
Aleatoric and epistemic uncertainty.
N-version programming.
An experimental evaluation of the assumption of independence in multiversion programming (Knight & Leveson).
Safety integrity level.
Software for Dependable Systems – Sufficient Evidence? (consensus study report).

March 15, 2018 Posted by | Computer science, Economics, Engineering, Lectures, Statistics | Leave a comment

Prevention of Late-Life Depression (I)

Late-life depression is a common and highly disabling condition and is also associated with higher health care utilization and overall costs. The presence of depression may complicate the course and treatment of comorbid major medical conditions that are also highly prevalent among older adults — including diabetes, hypertension, and heart disease. Furthermore, a considerable body of evidence has demonstrated that, for older persons, residual symptoms and functional impairment due to depression are common — even when appropriate depression therapies are being used. Finally, the worldwide phenomenon of a rapidly expanding older adult population means that unprecedented numbers of seniors — and the providers who care for them — will be facing the challenge of late-life depression. For these reasons, effective prevention of late-life depression will be a critical strategy to lower overall burden and cost from this disorder. […] This textbook will illustrate the imperative for preventing late-life depression, introduce a broad range of approaches and key elements involved in achieving effective prevention, and provide detailed examples of applications of late-life depression prevention strategies”.

I gave the book two stars on goodreads. There are 11 chapters in the book, written by 22 different contributors/authors, so of course there’s a lot of variation in the quality of the material included; the two star rating was an overall assessment of the quality of the material, and the last two chapters – but in particular chapter 10 – did a really good job convincing me that the the book did not deserve a 3rd star (if you decide to read the book, I advise you to skip chapter 10). In general I think many of the authors are way too focused on statistical significance and much too hesitant to report actual effect sizes, which are much more interesting. Gender is mentioned repeatedly throughout the coverage as an important variable, to the extent that people who do not read the book carefully might think this is one of the most important variables at play; but when you look at actual effect sizes, you get reported ORs of ~1.4 for this variable, compared to e.g. ORs in the ~8-9 for the bereavement variable (see below). You can quibble about population attributable fraction and so on here, but if the effect size is that small it’s unlikely to be all that useful in terms of directing prevention efforts/resource allocation (especially considering that women make out the majority of the total population in these older age groups anyway, as they have higher life expectancy than their male counterparts).

Anyway, below I’ve added some quotes and observations from the first few chapters of the book.

Meta-analyses of more than 30 randomized trials conducted in the High Income Countries show that the incidence of new depressive and anxiety disorders can be reduced by 25–50 % over 1–2 years, compared to usual care, through the use of learning-based psychotherapies (such as interpersonal psychotherapy, cognitive behavioral therapy, and problem solving therapy) […] The case for depression prevention is compelling and represents the key rationale for this volume: (1) Major depression is both prevalent and disabling, typically running a relapsing or chronic course. […] (2) Major depression is often comorbid with other chronic conditions like diabetes, amplifying the disability associated with these conditions and worsening family caregiver burden. (3) Depression is associated with worse physical health outcomes, partly mediated through poor treatment adherence, and it is associated with excess mortality after myocardial infarction, stroke, and cancer. It is also the major risk factor for suicide across the life span and particularly in old age. (4) Available treatments are only partially effective in reducing symptom burden, sustaining remission, and averting years lived with disability.”

“[M]any people suffering from depression do not receive any care and approximately a third of those receiving care do not respond to current treatments. The risk of recurrence is high, also in older persons: half of those who have experienced a major depression will experience one or even more recurrences [4]. […] Depression increases the risk at death: among people suffering from depression the risk of dying is 1.65 times higher than among people without a depression [7], with a dose-response relation between severity and duration of depression and the resulting excess mortality [8]. In adults, the average length of a depressive episode is 8 months but among 20 % of people the depression lasts longer than 2 years [9]. […] It has been estimated that in Australia […] 60 % of people with an affective disorder receive treatment, and using guidelines and standards only 34 % receives effective treatment [14]. This translates in preventing 15 % of Years Lived with Disability [15], a measure of disease burden [14] and stresses the need for prevention [16]. Primary health care providers frequently do not recognize depression, in particular among elderly. Older people may present their depressive symptoms differently from younger adults, with more emphasis on physical complaints [17, 18]. Adequate diagnosis of late-life depression can also be hampered by comorbid conditions such as Parkinson and dementia that may have similar symptoms, or by the fact that elderly people as well as care workers may assume that “feeling down” is part of becoming older [17, 18]. […] Many people suffering from depression do not seek professional help or are not identied as depressed [21]. Almost 14 % of elderly people living in community-type living suffer from a severe depression requiring clinical attention [22] and more than 50 % of those have a chronic course [4, 23]. Smit et al. reported an incidence of 6.1 % of chronic or recurrent depression among a sample of 2,200 elderly people (ages 55–85) [21].”

“Prevention differs from intervention and treatment as it is aimed at general population groups who vary in risk level for mental health problems such as late-life depression. The Institute of Medicine (IOM) has introduced a prevention framework, which provides a useful model for comprehending the different objectives of the interventions [29]. The overall goal of prevention programs is reducing risk factors and enhancing protective factors.
The IOM framework distinguishes three types of prevention interventions: (1) universal preventive interventions, (2) selective preventive interventions, and (3) indicated preventive interventions. Universal preventive interventions are targeted at the general audience, regardless of their risk status or the presence of symptoms. Selective preventive interventions serve those sub-populations who have a significantly higher than average risk of a disorder, either imminently or over a lifetime. Indicated preventive interventions target identified individuals with minimal but detectable signs or symptoms suggesting a disorder. This type of prevention consists of early recognition and early intervention of the diseases to prevent deterioration [30]. For each of the three types of interventions, the goal is to reduce the number of new cases. The goal of treatment, on the other hand, is to reduce prevalence or the total number of cases. By reducing incidence you also reduce prevalence [5]. […] prevention research differs from treatment research in various ways. One of the most important differences is the fact that participants in treatment studies already meet the criteria for the illness being studied, such as depression. The intervention is targeted at improvement or remission of the specific condition quicker than if no intervention had taken place. In prevention research, the participants do not meet the specific criteria for the illness being studied and the overall goal of the intervention is to prevent the development of a clinical illness at a lower rate than a comparison group [5].”

A couple of risk factors [for depression] occur more frequently among the elderly than among young adults. The loss of a loved one or the loss of a social role (e.g., employment), decrease of social support and network, and the increasing change of isolation occur more frequently among the elderly. Many elderly also suffer from physical diseases: 64 % of elderly aged 65–74 has a chronic disease [36] […]. It is important to note that depression often co-occurs with other disorders such as physical illness and other mental health problems (comorbidity). Losing a spouse can have significant mental health effects. Almost half of all widows and widowers during the first year after the loss meet the criteria for depression according to the DSM-IV [37]. Depression after loss of a loved one is normal in times of mourning. However, when depressive symptoms persist during a longer period of time it is possible that a depression is developing. Zisook and Shuchter found that a year after the loss of a spouse 16 % of widows and widowers met the criteria of a depression compared to 4 % of those who did not lose their spouse [38]. […] People with a chronic physical disease are also at a higher risk of developing a depression. An estimated 12–36 % of those with a chronic physical illness also suffer from clinical depression [40]. […] around 25 % of cancer patients suffer from depression [40]. […] Depression is relatively common among elderly residing in hospitals and retirement- and nursing homes. An estimated 6–11 % of residents have a depressive illness and among 30 % have depressive symptoms [41]. […] Loneliness is common among the elderly. Among those of 60 years or older, 43 % reported being lonely in a study conducted by Perissinotto et al. […] Loneliness is often associated with physical and mental complaints; apart from depression it also increases the chance of developing dementia and excess mortality [43].”

From the public health perspective it is important to know what the potential health benefits would be if the harmful effect of certain risk factors could be removed. What health benefits would arise from this, at which efforts and costs? To measure this the population attributive fraction (PAF) can be used. The PAF is expressed in a percentage and demonstrates the decrease of the percentage of incidences (number of new cases) when the harmful effects of the targeted risk factors are fully taken away. For public health it would be more effective to design an intervention targeted at a risk factor with a high PAF than a low PAF. […] An intervention needs to be effective in order to be implemented; this means that it has to show a statistically significant difference with placebo or other treatment. Secondly, it needs to be effective; it needs to prove its benefits also in real life (“everyday care”) circumstances. Thirdly, it needs to be efficient. The measure to address this is the Number Needed to Be Treated (NNT). The NNT expresses how many people need to be treated to prevent the onset of one new case with the disorder; the lower the number, the more efficient the intervention [45]. To summarize, an indicated preventative intervention would ideally be targeted at a relatively small group of people with a high, absolute chance of developing the disease, and a risk profile that is responsible for a high PAF. Furthermore, there needs to be an intervention that is both effective and efficient. […] a more detailed and specific description of the target group results in a higher absolute risk, a lower NNT, and also a lower PAF. This is helpful in determining the costs and benefits of interventions aiming at more specific or broader subgroups in the population. […] Unfortunately very large samples are required to demonstrate reductions in universal or selected interventions [46]. […] If the incidence rate is higher in the target population, which is usually the case in selective and even more so in indicated prevention, the number of participants needed to prove an effect is much smaller [5]. This shows that, even though universal interventions may be effective, its effect is harder to prove than that of indicated prevention. […] Indicated and selective preventions appear to be the most successful in preventing depression to date; however, more research needs to be conducted in larger samples to determine which prevention method is really most effective.”

Groffen et al. [6] recently conducted an investigation among a sample of 4,809 participants from the Reykjavik Study (aged 66–93 years). Similar to the findings presented by Vink and colleagues [3], education level was related to depression risk: participants with lower education levels were more likely to report depressed mood in late-life than those with a college education (odds ratio [OR] = 1.87, 95 % confidence interval [CI] = 1.35–2.58). […] Results from a meta-analysis by Lorant and colleagues [8] showed that lower SES individuals had a greater odds of developing depression than those in the highest SES group (OR = 1.24, p= 0.004); however, the studies involved in this review did not focus on older populations. […] Cole and Dendukuri [10] performed a meta-analysis of studies involving middle-aged and older adult community residents, and determined that female gender was a risk factor for depression in this population (Pooled OR = 1.4, 95 % CI = 1.2–1.8), but not old age. Blazer and colleagues [11] found a significant positive association between older age and depressive symptoms in a sample consisting of community-dwelling older adults; however, when potential confounders such as physical disability, cognitive impairment, and gender were included in the analysis, the relationship between chronological age and depressive symptoms was reversed (p< 0.01). A study by Schoevers and colleagues [14] had similar results […] these findings suggest that higher incidence of depression observed among the oldest-old may be explained by other relevant factors. By contrast, the association of female gender with increased risk of late-life depression has been observed to be a highly consistent finding.”

In an examination of marital bereavement, Turvey et al. [16] analyzed data among 5,449 participants aged70 years […] recently bereaved participants had nearly nine times the odds of developing syndromal depression as married participants (OR = 8.8, 95 % CI = 5.1–14.9, p<0.0001), and they also had significantly higher risk of depressive symptoms 2 years after the spousal loss. […] Caregiving burden is well-recognized as a predisposing factor for depression among older adults [18]. Many older persons are coping with physically and emotionally challenging caregiving roles (e.g., caring for a spouse/partner with a serious illness or with cognitive or physical decline). Additionally, many caregivers experience elements of grief, as they mourn the loss of relationship with or the decline of valued attributes of their care recipients. […] Concepts of social isolation have also been examined with regard to late-life depression risk. For example, among 892 participants aged 65 years […], Gureje et al. [13] found that women with a poor social network and rural residential status were more likely to develop major depressive disorder […] Harlow and colleagues [21] assessed the association between social network and depressive symptoms in a study involving both married and recently widowed women between the ages of 65 and 75 years; they found that number of friends at baseline had an inverse association with CES-D (Centers for Epidemiologic Studies Depression Scale) score after 1 month (p< 0.05) and 12 months (p= 0.06) of follow-up. In a study that explicitly addressed the concept of loneliness, Jaremka et al. [22] conducted a study relating this factor to late-life depression; importantly, loneliness has been validated as a distinct construct, distinguishable among older adults from depression. Among 229 participants (mean age = 70 years) in a cohort of older adults caring for a spouse with dementia, loneliness (as measured by the NYU scale) significantly predicted incident depression (p<0.001). Finally, social support has been identified as important to late-life depression risk. For example, Cui and colleagues [23] found that low perceived social support significantly predicted worsening depression status over a 2-year period among 392 primary care patients aged 65 years and above.”

“Saunders and colleagues [26] reported […] findings with alcohol drinking behavior as the predictor. Among 701 community-dwelling adults aged 65 years and above, the authors found a significant association between prior heavy alcohol consumption and late-life depression among men: compared to those who were not heavy drinkers, men with a history of heavy drinking had a nearly fourfold higher odds of being diagnosed with depression (OR = 3.7, 95 % CI = 1.3–10.4, p< 0.05). […] Almeida et al. found that obese men were more likely than non-obese (body mass index [BMI] < 30) men to develop depression (HR = 1.31, 95 % CI = 1.05–1.64). Consistent with these results, presence of the metabolic syndrome was also found to increase risk of incident depression (HR = 2.37, 95 % CI = 1.60–3.51). Finally, leisure-time activities are also important to study with regard to late-life depression risk, as these too are readily modifiable behaviors. For example, Magnil et al. [30] examined such activities among a sample of 302 primary care patients aged 60 years. The authors observed that those who lacked leisure activities had an increased risk of developing depressive symptoms over the 2-year study period (OR = 12, 95 % CI = 1.1–136, p= 0.041). […] an important future direction in addressing social and behavioral risk factors in late-life depression is to make more progress in trials that aim to alter those risk factors that are actually modifiable.”

February 17, 2018 Posted by | Books, Epidemiology, Health Economics, Medicine, Psychiatry, Psychology, Statistics | Leave a comment

Random stuff

I have almost stopped posting posts like these, which has resulted in the accumulation of a very large number of links and studies which I figured I might like to blog at some point. This post is mainly an attempt to deal with the backlog – I won’t cover the material in too much detail.

i. Do Bullies Have More Sex? The answer seems to be a qualified yes. A few quotes:

“Sexual behavior during adolescence is fairly widespread in Western cultures (Zimmer-Gembeck and Helfland 2008) with nearly two thirds of youth having had sexual intercourse by the age of 19 (Finer and Philbin 2013). […] Bullying behavior may aid in intrasexual competition and intersexual selection as a strategy when competing for mates. In line with this contention, bullying has been linked to having a higher number of dating and sexual partners (Dane et al. 2017; Volk et al. 2015). This may be one reason why adolescence coincides with a peak in antisocial or aggressive behaviors, such as bullying (Volk et al. 2006). However, not all adolescents benefit from bullying. Instead, bullying may only benefit adolescents with certain personality traits who are willing and able to leverage bullying as a strategy for engaging in sexual behavior with opposite-sex peers. Therefore, we used two independent cross-sectional samples of older and younger adolescents to determine which personality traits, if any, are associated with leveraging bullying into opportunities for sexual behavior.”

“…bullying by males signal the ability to provide good genes, material resources, and protect offspring (Buss and Shackelford 1997; Volk et al. 2012) because bullying others is a way of displaying attractive qualities such as strength and dominance (Gallup et al. 2007; Reijntjes et al. 2013). As a result, this makes bullies attractive sexual partners to opposite-sex peers while simultaneously suppressing the sexual success of same-sex rivals (Gallup et al. 2011; Koh and Wong 2015; Zimmer-Gembeck et al. 2001). Females may denigrate other females, targeting their appearance and sexual promiscuity (Leenaars et al. 2008; Vaillancourt 2013), which are two qualities relating to male mate preferences. Consequently, derogating these qualities lowers a rivals’ appeal as a mate and also intimidates or coerces rivals into withdrawing from intrasexual competition (Campbell 2013; Dane et al. 2017; Fisher and Cox 2009; Vaillancourt 2013). Thus, males may use direct forms of bullying (e.g., physical, verbal) to facilitate intersexual selection (i.e., appear attractive to females), while females may use relational bullying to facilitate intrasexual competition, by making rivals appear less attractive to males.”

The study relies on the use of self-report data, which I find very problematic – so I won’t go into the results here. I’m not quite clear on how those studies mentioned in the discussion ‘have found self-report data [to be] valid under conditions of confidentiality’ – and I remain skeptical. You’ll usually want data from independent observers (e.g. teacher or peer observations) when analyzing these kinds of things. Note in the context of the self-report data problem that if there’s a strong stigma associated with being bullied (there often is, or bullying wouldn’t work as well), asking people if they have been bullied is not much better than asking people if they’re bullying others.

ii. Some topical advice that some people might soon regret not having followed, from the wonderful Things I Learn From My Patients thread:

“If you are a teenage boy experimenting with fireworks, do not empty the gunpowder from a dozen fireworks and try to mix it in your mother’s blender. But if you do decide to do that, don’t hold the lid down with your other hand and stand right over it. This will result in the traumatic amputation of several fingers, burned and skinned forearms, glass shrapnel in your face, and a couple of badly scratched corneas as a start. You will spend months in rehab and never be able to use your left hand again.”

iii. I haven’t talked about the AlphaZero-Stockfish match, but I was of course aware of it and did read a bit about that stuff. Here’s a reddit thread where one of the Stockfish programmers answers questions about the match. A few quotes:

“Which of the two is stronger under ideal conditions is, to me, neither particularly interesting (they are so different that it’s kind of like comparing the maximum speeds of a fish and a bird) nor particularly important (since there is only one of them that you and I can download and run anyway). What is super interesting is that we have two such radically different ways to create a computer chess playing entity with superhuman abilities. […] I don’t think there is anything to learn from AlphaZero that is applicable to Stockfish. They are just too different, you can’t transfer ideas from one to the other.”

“Based on the 100 games played, AlphaZero seems to be about 100 Elo points stronger under the conditions they used. The current development version of Stockfish is something like 40 Elo points stronger than the version used in Google’s experiment. There is a version of Stockfish translated to hand-written x86-64 assembly language that’s about 15 Elo points stronger still. This adds up to roughly half the Elo difference between AlphaZero and Stockfish shown in Google’s experiment.”

“It seems that Stockfish was playing with only 1 GB for transposition tables (the area of memory used to store data about the positions previously encountered in the search), which is way too little when running with 64 threads.” [I seem to recall a comp sci guy observing elsewhere that this was less than what was available to his smartphone version of Stockfish, but I didn’t bookmark that comment].

“The time control was a very artificial fixed 1 minute/move. That’s not how chess is traditionally played. Quite a lot of effort has gone into Stockfish’s time management. It’s pretty good at deciding when to move quickly, and when to spend a lot of time on a critical decision. In a fixed time per move game, it will often happen that the engine discovers that there is a problem with the move it wants to play just before the time is out. In a regular time control, it would then spend extra time analysing all alternative moves and trying to find a better one. When you force it to move after exactly one minute, it will play the move it already know is bad. There is no doubt that this will cause it to lose many games it would otherwise have drawn.”

iv. Thrombolytics for Acute Ischemic Stroke – no benefit found.

“Thrombolysis has been rigorously studied in >60,000 patients for acute thrombotic myocardial infarction, and is proven to reduce mortality. It is theorized that thrombolysis may similarly benefit ischemic stroke patients, though a much smaller number (8120) has been studied in relevant, large scale, high quality trials thus far. […] There are 12 such trials 1-12. Despite the temptation to pool these data the studies are clinically heterogeneous. […] Data from multiple trials must be clinically and statistically homogenous to be validly pooled.14 Large thrombolytic studies demonstrate wide variations in anatomic stroke regions, small- versus large-vessel occlusion, clinical severity, age, vital sign parameters, stroke scale scores, and times of administration. […] Examining each study individually is therefore, in our opinion, both more valid and more instructive. […] Two of twelve studies suggest a benefit […] In comparison, twice as many studies showed harm and these were stopped early. This early stoppage means that the number of subjects in studies demonstrating harm would have included over 2400 subjects based on originally intended enrollments. Pooled analyses are therefore missing these phantom data, which would have further eroded any aggregate benefits. In their absence, any pooled analysis is biased toward benefit. Despite this, there remain five times as many trials showing harm or no benefit (n=10) as those concluding benefit (n=2), and 6675 subjects in trials demonstrating no benefit compared to 1445 subjects in trials concluding benefit.”

“Thrombolytics for ischemic stroke may be harmful or beneficial. The answer remains elusive. We struggled therefore, debating between a ‘yellow’ or ‘red’ light for our recommendation. However, over 60,000 subjects in trials of thrombolytics for coronary thrombosis suggest a consistent beneficial effect across groups and subgroups, with no studies suggesting harm. This consistency was found despite a very small mortality benefit (2.5%), and a very narrow therapeutic window (1% major bleeding). In comparison, the variation in trial results of thrombolytics for stroke and the daunting but consistent adverse effect rate caused by ICH suggested to us that thrombolytics are dangerous unless further study exonerates their use.”

“There is a Cochrane review that pooled estimates of effect. 17 We do not endorse this choice because of clinical heterogeneity. However, we present the NNT’s from the pooled analysis for the reader’s benefit. The Cochrane review suggested a 6% reduction in disability […] with thrombolytics. This would mean that 17 were treated for every 1 avoiding an unfavorable outcome. The review also noted a 1% increase in mortality (1 in 100 patients die because of thrombolytics) and a 5% increase in nonfatal intracranial hemorrhage (1 in 20), for a total of 6% harmed (1 in 17 suffers death or brain hemorrhage).”

v. Suicide attempts in Asperger Syndrome. An interesting finding: “Over 35% of individuals with AS reported that they had attempted suicide in the past.”

Related: Suicidal ideation and suicide plans or attempts in adults with Asperger’s syndrome attending a specialist diagnostic clinic: a clinical cohort study.

“374 adults (256 men and 118 women) were diagnosed with Asperger’s syndrome in the study period. 243 (66%) of 367 respondents self-reported suicidal ideation, 127 (35%) of 365 respondents self-reported plans or attempts at suicide, and 116 (31%) of 368 respondents self-reported depression. Adults with Asperger’s syndrome were significantly more likely to report lifetime experience of suicidal ideation than were individuals from a general UK population sample (odds ratio 9·6 [95% CI 7·6–11·9], p<0·0001), people with one, two, or more medical illnesses (p<0·0001), or people with psychotic illness (p=0·019). […] Lifetime experience of depression (p=0·787), suicidal ideation (p=0·164), and suicide plans or attempts (p=0·06) did not differ significantly between men and women […] Individuals who reported suicide plans or attempts had significantly higher Autism Spectrum Quotient scores than those who did not […] Empathy Quotient scores and ages did not differ between individuals who did or did not report suicide plans or attempts (table 4). Patients with self-reported depression or suicidal ideation did not have significantly higher Autism Spectrum Quotient scores, Empathy Quotient scores, or age than did those without depression or suicidal ideation”.

The fact that people with Asperger’s are more likely to be depressed and contemplate suicide is consistent with previous observations that they’re also more likely to die from suicide – for example a paper I blogged a while back found that in that particular (large Swedish population-based cohort-) study, people with ASD were more than 7 times as likely to die from suicide than were the comparable controls.

Also related: Suicidal tendencies hard to spot in some people with autism.

This link has some great graphs and tables of suicide data from the US.

Also autism-related: Increased perception of loudness in autism. This is one of the ‘important ones’ for me personally – I am much more sound-sensitive than are most people.

vi. Early versus Delayed Invasive Intervention in Acute Coronary Syndromes.

“Earlier trials have shown that a routine invasive strategy improves outcomes in patients with acute coronary syndromes without ST-segment elevation. However, the optimal timing of such intervention remains uncertain. […] We randomly assigned 3031 patients with acute coronary syndromes to undergo either routine early intervention (coronary angiography ≤24 hours after randomization) or delayed intervention (coronary angiography ≥36 hours after randomization). The primary outcome was a composite of death, myocardial infarction, or stroke at 6 months. A prespecified secondary outcome was death, myocardial infarction, or refractory ischemia at 6 months. […] Early intervention did not differ greatly from delayed intervention in preventing the primary outcome, but it did reduce the rate of the composite secondary outcome of death, myocardial infarction, or refractory ischemia and was superior to delayed intervention in high-risk patients.”

vii. Some wikipedia links:

Behrens–Fisher problem.
Sailing ship tactics (I figured I had to read up on this if I were to get anything out of the Aubrey-Maturin books).
Anatomical terms of muscle.
Phatic expression (“a phatic expression […] is communication which serves a social function such as small talk and social pleasantries that don’t seek or offer any information of value.”)
Three-domain system.
Beringian wolf (featured).
Subdural hygroma.
Cayley graph.
Schur polynomial.
Solar neutrino problem.
Hadamard product (matrices).
True polar wander.
Newton’s cradle.

viii. Determinant versus permanent (mathematics – technical).

ix. Some years ago I wrote a few English-language posts about some of the various statistical/demographic properties of immigrants living in Denmark, based on numbers included in a publication by Statistics Denmark. I did it by translating the observations included in that publication, which was only published in Danish. I was briefly considering doing the same thing again when the 2017 data arrived, but I decided not to do it as I recalled that it took a lot of time to write those posts back then, and it didn’t seem to me to be worth the effort – but Danish readers might be interested to have a look at the data, if they haven’t already – here’s a link to the publication Indvandrere i Danmark 2017.

x. A banter blitz session with grandmaster Peter Svidler, who recently became the first Russian ever to win the Russian Chess Championship 8 times. He’s currently shared-second in the World Rapid Championship after 10 rounds and is now in the top 10 on the live rating list in both classical and rapid – seems like he’s had a very decent year.

xi. I recently discovered Dr. Whitecoat’s blog. The patient encounters are often interesting.

December 28, 2017 Posted by | Astronomy, autism, Biology, Cardiology, Chess, Computer science, History, Mathematics, Medicine, Neurology, Physics, Psychiatry, Psychology, Random stuff, Statistics, Studies, Wikipedia, Zoology | Leave a comment

Occupational Epidemiology (III)

This will be my last post about the book.

Some observations from the final chapters:

“Often there is confusion about the difference between systematic reviews and metaanalyses. A meta-analysis is a quantitative synthesis of two or more studies […] A systematic review is a synthesis of evidence on the effects of an intervention or an exposure which may also include a meta-analysis, but this is not a prerequisite. It may be that the results of the studies which have been included in a systematic review are reported in such a way that it is impossible to synthesize them quantitatively. They can then be reported in a narrative manner.10 However, a meta-analysis always requires a systematic review of the literature. […] There is a long history of debate about the value of meta-analysis for occupational cohort studies or other occupational aetiological studies. In 1994, Shapiro argued that ‘meta-analysis of published non-experimental data should be abandoned’. He reasoned that ‘relative risks of low magnitude (say, less than 2) are virtually beyond the resolving power of the epidemiological microscope because we can seldom demonstrably eliminate all sources of bias’.13 Because the pooling of studies in a meta-analysis increases statistical power, the pooled estimate may easily become significant and thus incorrectly taken as an indication of causality, even though the biases in the included studies may not have been taken into account. Others have argued that the method of meta-analysis is important but should be applied appropriately, taking into account the biases in individual studies.14 […] We believe that the synthesis of aetiological studies should be based on the same general principles as for intervention studies, and the existing methods adapted to the particular challenges of cohort and case-control studies. […] Since 2004, there is a special entity, the Cochrane Occupational Safety and Health Review Group, that is responsible for the preparing and updating of reviews of occupational safety and health interventions […]. There were over 100 systematic reviews on these topics in the Cochrane Library in 2012.”

“The believability of a systematic review’s results depends largely on the quality of the included studies. Therefore, assessing and reporting on the quality of the included studies is important. For intervention studies, randomized trials are regarded as of higher quality than observational studies, and the conduct of the study (e.g. in terms of response rate or completeness of follow-up) also influences quality. A conclusion derived from a few high-quality studies will be more reliable than when the conclusion is based on even a large number of low-quality studies. Some form of quality assessment is nowadays commonplace in intervention reviews but is still often missing in reviews of aetiological studies. […] It is tempting to use quality scores, such as the Jadad scale for RCTs34 and the Downs and Black scale for non-RCT intervention studies35 but these, in their original format, are insensitive to variation in the importance of risk areas for a given research question. The score system may give the same value to two studies (say, 10 out of 12) when one, for example, lacked blinding and the other did not randomize, thus implying that their quality is equal. This would not be a problem if randomization and blinding were equally important for all questions in all reviews, but this is not the case. For RCTs an important development in this regard has been the Cochrane risk of bias tool.36 This is a checklist of six important domains that have been shown to be important areas of bias in RCTs: random sequence generation, allocation concealment, blinding of participants and personnel, blinding of outcome assessment, incomplete outcome data, and selective reporting.”

“[R]isks of bias tools developed for intervention studies cannot be used for reviews of aetiological studies without relevant modification. This is because, unlike interventions, exposures are usually more complicated to assess when we want to attribute the outcome to them alone. These scales do not cover all items that may need assessment in an aetiological study, such as confounding and information bias relating to exposures. […] Surprisingly little methodological work has been done to develop validated tools for aetiological epidemiology and most tools in use are not validated,38 […] Two separate checklists, for observational studies of incidence and prevalence and for risk factor assessment, have been developed and validated recently.40 […] Publication and other reporting bias is probably a much bigger issue for aetiological studies than for intervention studies. This is because, for clinical trials, the introduction of protocol registration, coupled with the regulatory system for new medications, has helped in assessing and preventing publication and reporting bias. No such checks exist for observational studies.”

“Most ill health that arises from occupational exposures can also arise from nonoccupational exposures, and the same type of exposure can occur in occupational and non-occupational settings. With the exception of malignant mesothelioma (which is essentially only caused by exposure to asbestos), there is no way to determine which exposure caused a particular disorder, nor where the causative exposure occurred. This means that usually it is not possible to determine the burden just by counting the number of cases. Instead, approaches to estimating this burden have been developed. There are also several ways to define burden and how best to measure it.”

“The population attributable fraction (PAF) is the proportion of cases that would not have occurred in the absence of an occupational exposure. It can be estimated by combining two measures — a risk estimate (usually relative risk (RR) or odds ratio) of the disorder of interest that is associated with exposure to the substance of concern; and an estimate of the proportion of the population exposed to the substance at work (p(E)). This approach has been used in several studies, particularly for estimating cancer burden […] There are several possible equations that can be used to calculate the PAF, depending on the available data […] PAFs cannot in general be combined by summing directly because: (1) summing PAFs for overlapping exposures (i.e. agents to which the same ‘ever exposed’ workers may have been exposed) may give an overall PAF exceeding 100%, and (2) summing disjoint (not concurrently occurring) exposures also introduces upward bias. Strategies to avoid this include partitioning exposed numbers between overlapping exposures […] or estimating only for the ‘dominant’ carcinogen with the highest risk. Where multiple exposures remain, one approach is to assume that the exposures are independent and their joint effects are multiplicative. The PAFs can then be combined to give an overall PAF for that cancer using a product sum. […] Potential sources of bias for PAFs include inappropriate choice of risk estimates, imprecision in the risk estimates and estimates of proportions exposed, inaccurate risk exposure period and latency assumptions, and a lack of separate risk estimates in some cases for women and/or cancer incidence. In addition, a key decision is the choice of which diseases and exposures are to be included.”

“The British Cancer Burden study is perhaps the most detailed study of occupationally related cancers in that it includes all those relevant carcinogens classified at the end of 2008 […] In the British study the attributable fractions ranged from less than 0.01% to 95% overall, the most important cancer sites for occupational attribution being, for men, mesothelioma (97%), sinonasal (46%), lung (21.1%), bladder (7.1%), and non-melanoma skin cancer (7.1%) and, for women, mesothelioma (83%), sinonasal (20.1%), lung (5.3%), breast (4.6%), and nasopharynx (2.5%). Occupation also contributed 2% or more overall to cancers of the larynx, oesophagus, and stomach, and soft tissue sarcoma with, in addition for men, melanoma of the eye (due to welding), and non-Hodgkin lymphoma. […] The overall results from the occupational risk factors component of the Global Burden of Disease 2010 study illustrate several important aspects of burden studies.14 Of the estimated 850 000 occupationally related deaths worldwide, the top three causes were: (1) injuries (just over a half of all deaths); (2) particulate matter, gases, and fumes leading to COPD; and (3) carcinogens. When DALYs were used as the burden measure, injuries still accounted for the highest proportion (just over one-third), but ergonomic factors leading to low back pain resulted in almost as many DALYs, and both were almost an order of magnitude higher than the DALYs from carcinogens. The difference in relative contributions of the various risk factors between deaths and DALYs arises because of the varying ages of those affected, and the differing chronicity of the resulting conditions. Both measures are valid, but they represent a different aspect of the burden arising from the hazardous exposures […]. Both the British and Global Burden of Disease studies draw attention to the important issues of: (1) multiple occupational carcinogens causing specific types of cancer, for example, the British study evaluated 21 lung carcinogens; and (2) specific carcinogens causing several different cancers, for example, IARC now defines asbestos as a group 1 or 2A carcinogen for seven cancer sites. These issues require careful consideration for burden estimation and for prioritizing risk reduction strategies. […] The long latency of many cancers means that estimates of current burden are based on exposures occurring in the past, often much higher than those existing today. […] long latency [also] means that risk reduction measures taken now will take a considerable time to be reflected in reduced disease incidence.”

“Exposures and effects are linked by dynamic processes occurring across time. These processes can often be usefully decomposed into two distinct biological relationships, each with several components: 1. The exposure-dose relationship […] 2. The dose-effect relationship […] These two component relationships are sometimes represented by two different mathematical models: a toxicokinetic model […], and a disease process model […]. Depending on the information available, these models may be relatively simple or highly complex. […] Often the various steps in the disease process do not occur at the same rate, some of these processes are ‘fast’, such as cell killing, while others are ‘slow’, such as damage repair. Frequently a few slow steps in a process become limiting to the overall rate, which sets the temporal pattern for the entire exposure-response relationship. […] It is not necessary to know the full mechanism of effects to guide selection of an exposure-response model or exposure metric. Because of the strong influence of the rate-limiting steps, often it is only necessary to have observations on the approximate time course of effects. This is true whether the effects appear to be reversible or irreversible, and whether damage progresses proportionately with each unit of exposure (actually dose) or instead occurs suddenly, and seemingly without regard to the amount of exposure, such as an asthma attack.”

“In this chapter, we argue that formal disease process models have the potential to improve the sensitivity of epidemiology for detecting new and emerging occupational and environmental risks where there is limited mechanistic information. […] In our approach, these models are often used to create exposure or dose metrics, which are in turn used in epidemiological models to estimate exposure-disease associations. […] Our goal is a methodology to formulate strong tests of our exposure-disease hypotheses in which a hypothesis is developed in as much biological detail as it can be, expressed in a suitable dynamic (temporal) model, and tested by its fit with a rich data set, so that its flaws and misperceptions of reality are fully displayed. Rejecting such a fully developed biological hypothesis is more informative than either rejecting or failing to reject a generic or vaguely defined hypothesis.” For example, the hypothesis ‘truck drivers have more risk of lung cancer than non-drivers’13 is of limited usefulness for prevention […]. Hypothesizing that a particular chemical agent in truck exhaust is associated with lung cancer — whether the hypothesis is refuted or supported by data — is more likely to lead to successful prevention activities. […] we believe that the choice of models against which to compare the data should, so far as possible, be guided by explicit hypotheses about the underlying biological processes. In other words, you can get as much as possible from epidemiology by starting from well-thought-out hypotheses that are formalized as mathematical models into which the data will be placed. The disease process models can serve this purpose.2″

“The basic idea of empirical Bayes (EB) and semiBayes (SB) adjustments for multiple associations is that the observed variation of the estimated relative risks around their geometric mean is larger than the variation of the true (but unknown) relative risks. In SB adjustments, an a priori value for the extra variation is chosen which assigns a reasonable range of variation to the true relative risks and this value is then used to adjust the observed relative risks.7 The adjustment consists in shrinking outlying relative risks towards the overall mean (of the relative risks for all the different exposures being considered). The larger the individual variance of the relative risks, the stronger the shrinkage, so that the shrinkage is stronger for less reliable estimates based on small numbers. Typical applications in which SB adjustments are a useful alternative to traditional methods of adjustment for multiple comparisons are in large occupational surveillance studies, where many relative risks are estimated with few or no a priori beliefs about which associations might be causal.7″

“The advantage of [the SB adjustment] approach over classical Bonferroni corrections is that on the average it produces more valid estimates of the odds ratio for each occupation/exposure. If we do a study which involves assessing hundreds of occupations, the problem is not only that we get many ‘false positive’ results by chance. A second problem is that even the ‘true positives’ tend to have odds ratios that are too high. For example, if we have a group of occupations with true odds ratios around 1.5, then the ones that stand out in the analysis are those with the highest odds ratios (e.g. 2.5) which will be elevated partly because of real effects and partly by chance. The Bonferroni correction addresses the first problem (too many chance findings) but not the second, that the strongest odds ratios are probably too high. In contrast, SB adjustment addresses the second problem by correcting for the anticipated regression to the mean that would have occurred if the study had been repeated, and thereby on the average produces more valid odds ratio estimates for each occupation/exposure. […] most epidemiologists write their Methods and Results sections as frequentists and their Introduction and Discussion sections as Bayesians. In their Methods and Results sections, they ‘test’ their findings as if their data are the only data that exist. In the Introduction and Discussion, they discuss their findings with regard to their consistency with previous studies, as well as other issues such as biological plausibility. This creates tensions when a small study has findings which are not statistically significant but which are consistent with prior knowledge, or when a study finds statistically significant findings which are inconsistent with prior knowledge. […] In some (but not all) instances, things can be made clearer if we include Bayesian methods formally in the Methods and Results sections of our papers”.

“In epidemiology, risk is most often quantified in terms of relative risk — i.e. the ratio of the probability of an adverse outcome in someone with a specified exposure to that in someone who is unexposed, or exposed at a different specified level. […] Relative risks can be estimated from a wider range of study designs than individual attributable risks. They have the advantage that they are often stable across different groups of people (e.g. of different ages, smokers, and non-smokers) which makes them easier to estimate and quantify. Moreover, high relative risks are generally unlikely to be explained by unrecognized bias or confounding. […] However, individual attributable risks are a more relevant measure by which to quantify the impact of decisions in risk management on individuals. […] Individual attributable risk is the difference in the probability of an adverse outcome between someone with a specified exposure and someone who is unexposed, or exposed at a different specified level. It is the critical measure when considering the impact of decisions in risk management on individuals. […] Population attributable risk is the difference in the frequency of an adverse outcome between a population with a given distribution of exposures to a hazardous agent, and that in a population with no exposure, or some other specified distribution of exposures. It depends on the prevalence of exposure at different levels within the population, and on the individual attributable risk for each level of exposure. It is a measure of the impact of the agent at a population level, and is relevant to decisions in risk management for populations. […] Population attributable risks are highest when a high proportion of a population is exposed at levels which carry high individual attributable risks. On the other hand, an exposure which carries a high individual attributable risk may produce only a small population attributable risk if the prevalence of such exposure is low.”

“Hazard characterization entails quantification of risks in relation to routes, levels, and durations of exposure. […] The findings from individual studies are often used to determine a no observed adverse effect level (NOAEL), lowest observed effect level (LOEL), or benchmark dose lower 95% confidence limit (BMDL) for relevant effects […] [NOAEL] is the highest dose or exposure concentration at which there is no discernible adverse effect. […] [LOEL] is the lowest dose or exposure concentration at which a discernible effect is observed. If comparison with unexposed controls indicates adverse effects at all of the dose levels in an experiment, a NOAEL cannot be derived, but the lowest dose constitutes a LOEL, which might be used as a comparator for estimated exposures or to derive a toxicological reference value […] A BMDL is defined in relation to a specified adverse outcome that is observed in a study. Usually, this is the outcome which occurs at the lowest levels of exposure and which is considered critical to the assessment of risk. Statistical modelling is applied to the experimental data to estimate the dose or exposure concentration which produces a specified small level of effect […]. The BMDL is the lower 95% confidence limit for this estimate. As such, it depends both on the toxicity of the test chemical […], and also on the sample sizes used in the study (other things being equal, larger sample sizes will produce more precise estimates, and therefore higher BMDLs). In addition to accounting for sample size, BMDLs have the merit that they exploit all of the data points in a study, and do not depend so critically on the spacing of doses that is adopted in the experimental design (by definition a NOAEL or LOEL can only be at one of the limited number of dose levels used in the experiment). On the other hand, BMDLs can only be calculated where an adverse effect is observed. Even if there are no clear adverse effects at any dose level, a NOAEL can be derived (it will be the highest dose administered).”

December 8, 2017 Posted by | Books, Cancer/oncology, Epidemiology, Medicine, Statistics | Leave a comment

Occupational Epidemiology (II)

Some more observations from the book below.

“RD [Retinal detachment] is the separation of the neurosensory retina from the underlying retinal pigment epithelium.1 RD is often preceded by posterior vitreous detachment — the separation of the posterior vitreous from the retina as a result of vitreous degeneration and shrinkage2 — which gives rise to the sudden appearance of floaters and flashes. Late symptoms of RD may include visual field defects (shadows, curtains) or even blindness. The success rate of RD surgery has been reported to be over 90%;3 however, a loss of visual acuity is frequently reported by patients, particularly if the macula is involved.4 Since the natural history of RD can be influenced by early diagnosis, patients experiencing symptoms of posterior vitreous detachment are advised to undergo an ophthalmic examination.5 […] Studies of the incidence of RD give estimates ranging from 6.3 to 17.9 cases per 100 000 person-years.6 […] Age is a well-known risk factor for RD. In most studies the peak incidence was recorded among subjects in their seventh decade of life. A secondary peak at a younger age (20–30 years) has been identified […] attributed to RD among highly myopic patients.6 Indeed, depending on the severity,
myopia is associated with a four- to ten-fold increase in risk of RD.7 [Diabetics with retinopathy are also at increased risk of RD, US] […] While secondary prevention of RD is current practice, no effective primary prevention strategy is available at present. The idea is widespread among practitioners that RD is not preventable, probably the consequence of our historically poor understanding of the aetiology of RD. For instance, on the website of the Mayo Clinic — one of the top-ranked hospitals for ophthalmology in the US — it is possible to read that ‘There’s no way to prevent retinal detachment’.9

“Intraocular pressure […] is influenced by physical activity. Dynamic exercise causes an acute reduction in intraocular pressure, whereas physical fitness is associated with a lower baseline value.29 Conversely, a sudden rise in intraocular pressure has been reported during the Valsalva manoeuvre.30-32 […] Occupational physical activity may […] cause both short- and long-term variations in intraocular pressure. On the one hand, physically demanding jobs may contribute to decreased baseline levels by increasing physical fitness but, on the other hand, lifting tasks may cause an important acute increase in pressure. Moreover, the eye of a manual worker who performs repeated lifting tasks involving the Valsalva manoeuvre may undergo several dramatic changes in intraocular pressure within a single working shift. […] A case-control study was carried out to test the hypothesis that repeated lifting tasks involving the Valsalva manoeuvre could be a risk factor for RD. […] heavy lifting was a strong risk factor for RD (OR 4.4, 95% CI 1.6–13). Intriguingly, body mass index (BMI) also showed a clear association with RD (top quartile: OR 6.8, 95% CI 1.6–29). […] Based on their findings, the authors concluded that heavy occupational lifting (involving the Valsalva manoeuvre) may be a relevant risk factor for RD in myopics.

“The proportion of the world’s population over 60 is forecast to double from 11.6% in 2012 to 21.8% in 2050.1 […] the International Labour Organization notes that, worldwide, just 40% of the working age population has legal pension coverage, and only 26% of the working population is effectively covered by old-age pension schemes. […] in less developed regions, labour force participation in those over 65 is much higher than in more developed regions.8 […] Longer working lives increase cumulative exposures, as well as increasing the time since exposure — important when there is a long latency period between exposure and resultant disease. Further, some exposures may have a greater effect when they occur to older workers, e.g. carcinogens that are promoters rather than initiators. […] Older workers tend to have more chronic health conditions. […] Older workers have fewer injuries, but take longer to recover. […] For some ‘knowledge workers’, like physicians, even a relatively minor cognitive decline […] might compromise their competence. […]  Most past studies have treated age as merely a confounding variable and rarely, if ever, have considered it an effect modifier. […]  Jex and colleagues24 argue that conceptually we should treat age as the variable of interest so that other variables are viewed as moderating the impact of age. […] The single best improvement to epidemiological research on ageing workers is to conduct longitudinal studies, including follow-up of workers into retirement. Cross-sectional designs almost certainly incur the healthy survivor effect, since unhealthy workers may retire early.25 […] Analyses should distinguish ageing per se, genetic factors, work exposures, and lifestyle in order to understand their relative and combined effects on health.”

“Musculoskeletal disorders have long been recognized as an important source of morbidity and disability in many occupational populations.1,2 Most musculoskeletal disorders, for most people, are characterized by recurrent episodes of pain that vary in severity and in their consequences for work. Most episodes subside uneventfully within days or weeks, often without any intervention, though about half of people continue to experience some pain and functional limitations after 12 months.3,4 In working populations, musculoskeletal disorders may lead to a spell of sickness absence. Sickness absence is increasingly used as a health parameter of interest when studying the consequences of functional limitations due to disease in occupational groups. Since duration of sickness absence contributes substantially to the indirect costs of illness, interventions increasingly address return to work (RTW).5 […] The Clinical Standards Advisory Group in the United Kingdom reported RTW within 2 weeks for 75% of all low back pain (LBP) absence episodes and suggested that approximately 50% of all work days lost due to back pain in the working population are from the 85% of people who are off work for less than 7 days.6″

Any RTW curve over time can be described with a mathematical Weibull function.15 This Weibull function is characterized by a scale parameter λ and a shape parameter k. The scale parameter λ is a function of different covariates that include the intervention effect, preferably expressed as hazard ratio (HR) between the intervention group and the reference group in a Cox’s proportional hazards regression model. The shape parameter k reflects the relative increase or decrease in survival time, thus expressing how much the RTW rate will decrease with prolonged sick leave. […] a HR as measure of effect can be introduced as a covariate in the scale parameter λ in the Weibull model and the difference in areas under the curve between the intervention model and the basic model will give the improvement in sickness absence days due to the intervention. By introducing different times of starting the intervention among those workers still on sick leave, the impact of timing of enrolment can be evaluated. Subsequently, the estimated changes in total sickness absence days can be expressed in a benefit/cost ratio (BC ratio), where benefits are the costs saved due to a reduction in sickness absence and costs are the expenditures relating to the intervention.15″

“A crucial factor in understanding why interventions are effective or not is the timing of the enrolment of workers on sick leave into the intervention. The RTW pattern over time […] has important consequences for appropriate timing of the best window for effective clinical and occupational interventions. The evidence presented by Palmer and colleagues clearly suggests that [in the context of LBP] a stepped care approach is required. In the first step of rapid RTW, most workers will return to work even without specific interventions. Simple, short interventions involving effective coordination and cooperation between primary health care and the workplace will be sufficient to help the majority of workers to achieve an early RTW. In the second step, more expensive, structured interventions are reserved for those who are having difficulties returning, typically between 4 weeks and 3 months. However, to date there is little evidence on the optimal timing of such interventions for workers on sick leave due to LBP.14,15 […] the cost-benefits of a structured RTW intervention among workers on sick leave will be determined by the effectiveness of the intervention, the natural speed of RTW in the target population, the timing of the enrolment of workers into the intervention, and the costs of both the intervention and of a day of sickness absence. […] The cost-effectiveness of a RTW intervention will be determined by the effectiveness of the intervention, the costs of the intervention and of a day of sickness absence, the natural course of RTW in the target population, the timing of the enrolment of workers into the RTW intervention, and the time lag before the intervention takes effect. The latter three factors are seldom taken into consideration in systematic reviews and guidelines for management of RTW, although their impact may easily be as important  as classical measures of effectiveness, such as effect size or HR.”

“In order to obtain information of the highest quality and utility, surveillance schemes have to be designed, set up, and managed with the same methodological rigour as high-calibre prospective cohort studies. Whether surveillance schemes are voluntary or not, considerable effort has to be invested to ensure a satisfactory and sufficient denominator, the best numerator quality, and the most complete ascertainment. Although the force of statute is relied upon in some surveillance schemes, even in these the initial and continuing motivation of the reporters (usually physicians) is paramount. […] There is a surveillance ‘pyramid’ within which the patient’s own perception is at the base, the GP is at a higher level, and the clinical specialist is close to the apex. The source of the surveillance reports affects the numerator because case severity and case mix differ according to the level in the pyramid.19 Although incidence rate estimates may be expected to be lower at the higher levels in the surveillance pyramid this is not necessarily always the case. […] Although surveillance undertaken by physicians who specialize in the organ system concerned or in occupational disease (or in both aspects) may be considered to be the medical ‘gold standard’ it can suffer from a more limited patient catchment because of various referral filters. Surveillance by GPs will capture numerator cases as close to the base of the pyramid as possible, but may suffer from greater diagnostic variation than surveillance by specialists. Limiting recruitment to GPs with a special interest, and some training, in occupational medicine is a compromise between the two levels.20

“When surveillance is part of a statutory or other compulsory scheme then incident case identification is a continuous and ongoing process. However, when surveillance is voluntary, for a research objective, it may be preferable to sample over shorter, randomly selected intervals, so as to reduce the demands associated with the data collection and ‘reporting fatigue’. Evidence so far suggests that sampling over shorter time intervals results in higher incidence estimates than continuous sampling.21 […] Although reporting fatigue is an important consideration in tempering conclusions drawn from […] multilevel models, it is possible to take account of this potential bias in various ways. For example, when evaluating interventions, temporal trends in outcomes resulting from other exposures can be used to control for fatigue.23,24 The phenomenon of reporting fatigue may be characterized by an ‘excess of zeroes’ beyond what is expected of a Poisson distribution and this effect can be quantified.27 […] There are several considerations in determining incidence from surveillance data. It is possible to calculate an incidence rate based on the general population, on the population of working age, or on the total working population,19 since these denominator bases are generally readily available, but such rates are not the most useful in determining risk. Therefore, incidence rates are usually calculated in respect of specific occupations or industries.22 […] Ideally, incidence rates should be expressed in relation to quantitative estimates of exposure but most surveillance schemes would require additional data collection as special exercises to achieve this aim.” [for much more on these topics, see also M’ikanatha & Iskander’s book.]

“Estimates of lung cancer risk attributable to occupational exposures vary considerably by geographical area and depend on study design, especially on the exposure assessment method, but may account for around 5–20% of cancers among men, but less (<5%) among women;2 among workers exposed to (suspected) lung carcinogens, the percentage will be higher. […] most exposure to known lung carcinogens originates from occupational settings and will affect millions of workers worldwide.  Although it has been established that these agents are carcinogenic, only limited evidence is available about the risks encountered at much lower levels in the general population. […] One of the major challenges in community-based occupational epidemiological studies has been valid assessment of the occupational exposures experienced by the population at large. Contrary to the detailed information usually available for an industrial population (e.g. in a retrospective cohort study in a large chemical company) that often allows for quantitative exposure estimation, community-based studies […] have to rely on less precise and less valid estimates. The choice of method of exposure assessment to be applied in an epidemiological study depends on the study design, but it boils down to choosing between acquiring self-reported exposure, expert-based individual exposure assessment, or linking self-reported job histories with job-exposure matrices (JEMs) developed by experts. […] JEMs have been around for more than three decades.14 Their main distinction from either self-reported or expert-based exposure assessment methods is that exposures are no longer assigned at the individual subject level but at job or task level. As a result, JEMs make no distinction in assigned exposure between individuals performing the same job, or even between individuals performing a similar job in different companies. […] With the great majority of occupational exposures having a rather low prevalence (<10%) in the general population it is […] extremely important that JEMs are developed aiming at a highly specific exposure assessment so that only jobs with a high likelihood (prevalence) and intensity of exposure are considered to be exposed. Aiming at a high sensitivity would be disastrous because a high sensitivity would lead to an enormous number of individuals being assigned an exposure while actually being unexposed […] Combinations of the methods just described exist as well”.

“Community-based studies, by definition, address a wider range of types of exposure and a much wider range of encountered exposure levels (e.g. relatively high exposures in primary production but often lower in downstream use, or among indirectly exposed individuals). A limitation of single community-based studies is often the relatively low number of exposed individuals. Pooling across studies might therefore be beneficial. […] Pooling projects need careful planning and coordination, because the original studies were conducted for different purposes, at different time periods, using different questionnaires. This heterogeneity is sometimes perceived as a disadvantage but also implies variations that can be studied and thereby provide important insights. Every pooling project has its own dynamics but there are several general challenges that most pooling projects confront. Creating common variables for all studies can stretch from simple re-naming of variables […] or recoding of units […] to the re-categorization of national educational systems […] into years of formal education. Another challenge is to harmonize the different classification systems of, for example, diseases (e.g. International Classification of Disease (ICD)-9 versus ICD-10), occupations […], and industries […]. This requires experts in these respective fields as well as considerable time and money. Harmonization of data may mean losing some information; for example, ISCO-68 contains more detail than ISCO-88, which makes it possible to recode ISCO-68 to ISCO-88 with only a little loss of detail, but it is not possible to recode ISCO-88 to ISCO-68 without losing one or two digits in the job code. […] Making the most of the data may imply that not all studies will qualify for all analyses. For example, if a study did not collect data regarding lung cancer cell type, it can contribute to the overall analyses but not to the cell type-specific analyses. It is important to remember that the quality of the original data is critical; poor data do not become better by pooling.”

December 6, 2017 Posted by | Books, Cancer/oncology, Demographics, Epidemiology, Health Economics, Medicine, Ophthalmology, Statistics | Leave a comment

Quotes

i. “The party that negotiates in haste is often at a disadvantage.” (Howard Raiffa)

ii. “Advice: don’t embarrass your bargaining partner by forcing him or her to make all the concessions.” (-ll-)

iii. “Disputants often fare poorly when they each act greedily and deceptively.” (-ll-)

iv. “Each man does seek his own interest, but, unfortunately, not according to the dictates of reason.” (Kenneth Waltz)

v. “Whatever is said after I’m gone is irrelevant.” (Jimmy Savile)

vi. “Trust is an important lubricant of a social system. It is extremely efficient; it saves a lot of trouble to have a fair degree of reliance on other people’s word. Unfortunately this is not a commodity which can be bought very easily. If you have to buy it, you already have some doubts about what you have bought.” (Kenneth Arrow)

vii. “… an author never does more damage to his readers than when he hides a difficulty.” (Évariste Galois)

viii. “A technical argument by a trusted author, which is hard to check and looks similar to arguments known to be correct, is hardly ever checked in detail” (Vladimir Voevodsky)

ix. “Suppose you want to teach the “cat” concept to a very young child. Do you explain that a cat is a relatively small, primarily carnivorous mammal with retractible claws, a distinctive sonic output, etc.? I’ll bet not. You probably show the kid a lot of different cats, saying “kitty” each time, until it gets the idea. To put it more generally, generalizations are best made by abstraction from experience. They should come one at a time; too many at once overload the circuits.” (Ralph P. Boas Jr.)

x. “Every author has several motivations for writing, and authors of technical books always have, as one motivation, the personal need to understand; that is, they write because they want to learn, or to understand a phenomenon, or to think through a set of ideas.” (Albert Wymore)

xi. “Great mathematics is achieved by solving difficult problems not by fabricating elaborate theories in search of a problem.” (Harold Davenport)

xii. “Is science really gaining in its assault on the totality of the unsolved? As science learns one answer, it is characteristically true that it also learns several new questions. It is as though science were working in a great forest of ignorance, making an ever larger circular clearing within which, not to insist on the pun, things are clear… But as that circle becomes larger and larger, the circumference of contact with ignorance also gets longer and longer. Science learns more and more. But there is an ultimate sense in which it does not gain; for the volume of the appreciated but not understood keeps getting larger. We keep, in science, getting a more and more sophisticated view of our essential ignorance.” (Warren Weaver)

xiii. “When things get too complicated, it sometimes makes sense to stop and wonder: Have I asked the right question?” (Enrico Bombieri)

xiv. “The mean and variance are unambiguously determined by the distribution, but a distribution is, of course, not determined by its mean and variance: A number of different distributions have the same mean and the same variance.” (Richard von Mises)

xv. “Algorithms existed for at least five thousand years, but people did not know that they were algorithmizing. Then came Turing (and Post and Church and Markov and others) and formalized the notion.” (Doron Zeilberger)

xvi. “When a problem seems intractable, it is often a good idea to try to study “toy” versions of it in the hope that as the toys become increasingly larger and more sophisticated, they would metamorphose, in the limit, to the real thing.” (-ll-)

xvii. “The kind of mathematics foisted on children in schools is not meaningful, fun, or even very useful. This does not mean that an individual child cannot turn it into a valuable and enjoyable personal game. For some the game is scoring grades; for others it is outwitting the teacher and the system. For many, school math is enjoyable in its repetitiveness, precisely because it is so mindless and dissociated that it provides a shelter from having to think about what is going on in the classroom. But all this proves is the ingenuity of children. It is not a justifications for school math to say that despite its intrinsic dullness, inventive children can find excitement and meaning in it.” (Seymour Papert)

xviii. “The optimist believes that this is the best of all possible worlds, and the pessimist fears that this might be the case.” (Ivar Ekeland)

xix. “An equilibrium is not always an optimum; it might not even be good. This may be the most important discovery of game theory.” (-ll-)

xx. “It’s not all that rare for people to suffer from a self-hating monologue. Any good theories about what’s going on there?”

“If there’s things you don’t like about your life, you can blame yourself, or you can blame others. If you blame others and you’re of low status, you’ll be told to cut that out and start blaming yourself. If you blame yourself and you can’t solve the problems, self-hate is the result.” (Nancy Lebovitz & ‘The Nybbler’)

December 1, 2017 Posted by | Mathematics, Quotes/aphorisms, Science, Statistics | 4 Comments

Common Errors in Statistics… (III)

This will be my last post about the book. I liked most of it, and I gave it four stars on goodreads, but that doesn’t mean there weren’t any observations included in the book with which I took issue/disagreed. Here’s one of the things I didn’t like:

“In the univariate [model selection] case, if the errors were not normally distributed, we could take advantage of permutation methods to obtain exact significance levels in tests of the coefficients. Exact permutation methods do not exist in the multivariable case.

When selecting variables to incorporate in a multivariable model, we are forced to perform repeated tests of hypotheses, so that the resultant p-values are no longer meaningful. One solution, if sufficient data are available, is to divide the dataset into two parts, using the first part to select variables, and the second part to test these same variables for significance.” (chapter 13)

The basic idea is to use the results of hypothesis tests to decide which variables to include in the model. This is both common- and bad practice. I found it surprising that such a piece of advice would be included in this book, as I’d figured beforehand that this would precisely be the sort of thing a book like this one would tell people not to do. I’ve said this before multiple times on this blog, but I’ll keep saying it, especially if/when I find this sort of advice in statistics textbooks: Using hypothesis testing as a basis for model selection is an invalid approach to model selection, and it’s in general a terrible idea. “There is no statistical theory that supports the notion that hypothesis testing with a fixed α level is a basis for model selection.” (Burnham & Anderson). Use information criteria, not hypothesis tests, to make your model selection decisions. (And read Burnham & Anderson’s book on these topics.)

Anyway, much of the stuff included in the book was good stuff and it’s a very decent book. I’ve added some quotes and observations from the last part of the book below.

“OLS is not the only modeling technique. To diminish the effect of outliers, and treat prediction errors as proportional to their absolute magnitude rather than their squares, one should use least absolute deviation (LAD) regression. This would be the case if the conditional distribution of the dependent variable were characterized by a distribution with heavy tails (compared to the normal distribution, increased probability of values far from the mean). One should also employ LAD regression when the conditional distribution of the dependent variable given the predictors is not symmetric and we wish to estimate its median rather than its mean value.
If it is not clear which variable should be viewed as the predictor and which the dependent variable, as is the case when evaluating two methods of measurement, then one should employ Deming or error in variable (EIV) regression.
If one’s primary interest is not in the expected value of the dependent variable but in its extremes (the number of bacteria that will survive treatment or the number of individuals who will fall below the poverty line), then one ought consider the use of quantile regression.
If distinct strata exist, one should consider developing separate regression models for each stratum, a technique known as ecological regression [] If one’s interest is in classification or if the majority of one’s predictors are dichotomous, then one should consider the use of classification and regression trees (CART) […] If the outcomes are limited to success or failure, one ought employ logistic regression. If the outcomes are counts rather than continuous measurements, one should employ a generalized linear model (GLM).”

“Linear regression is a much misunderstood and mistaught concept. If a linear model provides a good fit to data, this does not imply that a plot of the dependent variable with respect to the predictor would be a straight line, only that a plot of the dependent variable with respect to some not-necessarily monotonic function of the predictor would be a line. For example, y = A + B log[x] and y = A cos(x) + B sin(x) are both linear models whose coefficients A and B might be derived by OLS or LAD methods. Y = Ax5 is a linear model. Y = xA is nonlinear. […] Perfect correlation (ρ2 = 1) does not imply that two variables are identical but rather that one of them, Y, say, can be written as a linear function of the other, Y = a + bX, where b is the slope of the regression line and a is the intercept. […] Nonlinear regression methods are appropriate when the form of the nonlinear model is known in advance. For example, a typical pharmacological model will have the form A exp[bX] + C exp[dW]. The presence of numerous locally optimal but globally suboptimal solutions creates challenges, and validation is essential. […] To be avoided are a recent spate of proprietary algorithms available solely in software form that guarantee to find a best-fitting solution. In the words of John von Neumann, “With four parameters I can fit an elephant and with five I can make him wiggle his trunk.””

“[T]he most common errors associated with quantile regression include: 1. Failing to evaluate whether the model form is appropriate, for example, forcing linear fit through an obvious nonlinear response. (Of course, this is also a concern with mean regression, OLS, LAD, or EIV.) 2. Trying to over interpret a single quantile estimate (say 0.85) with a statistically significant nonzero slope (p < 0.05) when the majority of adjacent quantiles (say 0.5 − 0.84 and 0.86 − 0.95) are clearly zero (p > 0.20). 3. Failing to use all the information a quantile regression provides. Even if you think you are only interested in relations near maximum (say 0.90 − 0.99), your understanding will be enhanced by having estimates (and sampling variation via confidence intervals) across a wide range of quantiles (say 0.01 − 0.99).”

“Survival analysis is used to assess time-to-event data including time to recovery and time to revision. Most contemporary survival analysis is built around the Cox model […] Possible sources of error in the application of this model include all of the following: *Neglecting the possible dependence of the baseline function λ0 on the predictors. *Overmatching, that is, using highly correlated predictors that may well mask each other’s effects. *Using the parametric Breslow or Kaplan–Meier estimators of the survival function rather than the nonparametric Nelson–Aalen estimator. *Excluding patients based on post-hoc criteria. Pathology workups on patients who died during the study may reveal that some of them were wrongly diagnosed. Regardless, patients cannot be eliminated from the study as we lack the information needed to exclude those who might have been similarly diagnosed but who are still alive at the conclusion of the study. *Failure to account for differential susceptibility (frailty) of the patients”.

“In reporting the results of your modeling efforts, you need to be explicit about the methods used, the assumptions made, the limitations on your model’s range of application, potential sources of bias, and the method of validation […] Multivariable regression is plagued by the same problems univariate regression is heir to, plus many more of its own. […] If choosing the correct functional form of a model in a univariate case presents difficulties, consider that in the case of k variables, there are k linear terms (should we use logarithms? should we add polynomial terms?) and k(k − 1) first-order cross products of the form xixk. Should we include any of the k(k − 1)(k − 2) second-order cross products? A common error is to attribute the strength of a relationship to the magnitude of the predictor’s regression coefficient […] Just scale the units in which the predictor is reported to see how erroneous such an assumption is. […] One of the main problems in multiple regression is multicollinearity, which is the correlation among predictors. Even relatively weak levels of multicollinearity are enough to generate instability in multiple regression models […]. A simple solution is to evaluate the correlation matrix M among predictors, and use this matrix to choose the predictors that are less correlated. […] Test M for each predictor, using the variance inflation factor (VIF) given by (1 − R2) − 1, where R2 is the multiple coefficient of determination of the predictor against all other predictors. If VIF is large for a given predictor (>8, say) delete this predictor and reestimate the model. […] Dropping collinear variables from the analysis can result in a substantial loss of power”.

“It can be difficult to predict the equilibrium point for a supply-and-demand model, because producers change their price in response to demand and consumers change their demand in response to price. Failing to account for endogeneous variables can lead to biased estimates of the regression coefficients.
Endogeneity can arise not only as a result of omitted variables, but of measurement error, autocorrelated errors, simultaneity, and sample selection errors. One solution is to make use of instrument variables that should satisfy two conditions: 1. They should be correlated with the endogenous explanatory variables, conditional on the other covariates. 2. They should not be correlated with the error term in the explanatory equation, that is, they should not suffer from the same problem as the original predictor.
Instrumental variables are commonly used to estimate causal effects in contexts in which controlled experiments are not possible, for example in estimating the effects of past and projected government policies.”

“[T]he following errors are frequently associated with factor analysis: *Applying it to datasets with too few cases in relation to the number of variables analyzed […], without noticing that correlation coefficients have very wide confidence intervals in small samples. *Using oblique rotation to get a number of factors bigger or smaller than the number of factors obtained in the initial extraction by principal components, as a way to show the validity of a questionnaire. For example, obtaining only one factor by principal components and using the oblique rotation to justify that there were two differentiated factors, even when the two factors were correlated and the variance explained by the second factor was very small. *Confusion among the total variance explained by a factor and the variance explained in the reduced factorial space. In this way a researcher interpreted that a given group of factors explaining 70% of the variance before rotation could explain 100% of the variance after rotation.”

“Poisson regression is appropriate when the dependent variable is a count, as is the case with the arrival of individuals in an emergency room. It is also applicable to the spatial distributions of tornadoes and of clusters of galaxies.2 To be applicable, the events underlying the outcomes must be independent […] A strong assumption of the Poisson regression model is that the mean and variance are equal (equidispersion). When the variance of a sample exceeds the mean, the data are said to be overdispersed. Fitting the Poisson model to overdispersed data can lead to misinterpretation of coefficients due to poor estimates of standard errors. Naturally occurring count data are often overdispersed due to correlated errors in time or space, or other forms of nonindependence of the observations. One solution is to fit a Poisson model as if the data satisfy the assumptions, but adjust the model-based standard errors usually employed. Another solution is to estimate a negative binomial model, which allows for scalar overdispersion.”

“When multiple observations are collected for each principal sampling unit, we refer to the collected information as panel data, correlated data, or repeated measures. […] The dependency of observations violates one of the tenets of regression analysis: that observations are supposed to be independent and identically distributed or IID. Several concerns arise when observations are not independent. First, the effective number of observations (that is, the effective amount of information) is less than the physical number of observations […]. Second, any model that fails to specifically address [the] correlation is incorrect […]. Third, although the correct specification of the correlation will yield the most efficient estimator, that specification is not the only one to yield a consistent estimator.”

“The basic issue in deciding whether to utilize a fixed- or random-effects model is whether the sampling units (for which multiple observations are collected) represent the collection of most or all of the entities for which inference will be drawn. If so, the fixed-effects estimator is to be preferred. On the other hand, if those same sampling units represent a random sample from a larger population for which we wish to make inferences, then the random-effects estimator is more appropriate. […] Fixed- and random-effects models address unobserved heterogeneity. The random-effects model assumes that the panel-level effects are randomly distributed. The fixed-effects model assumes a constant disturbance that is a special case of the random-effects model. If the random-effects assumption is correct, then the random-effects estimator is more efficient than the fixed-effects estimator. If the random-effects assumption does not hold […], then the random effects model is not consistent. To help decide whether the fixed- or random-effects models is more appropriate, use the Durbin–Wu–Hausman3 test comparing coefficients from each model. […] Although fixed-effects estimators and random-effects estimators are referred to as subject-specific estimators, the GEEs available through PROC GENMOD in SAS or xtgee in Stata, are called population-averaged estimators. This label refers to the interpretation of the fitted regression coefficients. Subject-specific estimators are interpreted in terms of an effect for a given panel, whereas population-averaged estimators are interpreted in terms of an affect averaged over panels.”

“A favorite example in comparing subject-specific and population-averaged estimators is to consider the difference in interpretation of regression coefficients for a binary outcome model on whether a child will exhibit symptoms of respiratory illness. The predictor of interest is whether or not the child’s mother smokes. Thus, we have repeated observations on children and their mothers. If we were to fit a subject-specific model, we would interpret the coefficient on smoking as the change in likelihood of respiratory illness as a result of the mother switching from not smoking to smoking. On the other hand, the interpretation of the coefficient in a population-averaged model is the likelihood of respiratory illness for the average child with a nonsmoking mother compared to the likelihood for the average child with a smoking mother. Both models offer equally valid interpretations. The interpretation of interest should drive model selection; some studies ultimately will lead to fitting both types of models. […] In addition to model-based variance estimators, fixed-effects models and GEEs [Generalized Estimating Equation models] also admit modified sandwich variance estimators. SAS calls this the empirical variance estimator. Stata refers to it as the Robust Cluster estimator. Whatever the name, the most desirable property of the variance estimator is that it yields inference for the regression coefficients that is robust to misspecification of the correlation structure. […] Specification of GEEs should include careful consideration of reasonable correlation structure so that the resulting estimator is as efficient as possible. To protect against misspecification of the correlation structure, one should base inference on the modified sandwich variance estimator. This is the default estimator in SAS, but the user must specify it in Stata.”

“There are three main approaches to [model] validation: 1. Independent verification (obtained by waiting until the future arrives or through the use of surrogate variables). 2. Splitting the sample (using one part for calibration, the other for verification) 3. Resampling (taking repeated samples from the original sample and refitting the model each time).
Goodness of fit is no guarantee of predictive success. […] Splitting the sample into two parts, one for estimating the model parameters, the other for verification, is particularly appropriate for validating time series models in which the emphasis is on prediction or reconstruction. If the observations form a time series, the more recent observations should be reserved for validation purposes. Otherwise, the data used for validation should be drawn at random from the entire sample. Unfortunately, when we split the sample and use only a portion of it, the resulting estimates will be less precise. […] The proportion to be set aside for validation purposes will depend upon the loss function. If both the goodness-of-fit error in the calibration sample and the prediction error in the validation sample are based on mean-squared error, Picard and Berk [1990] report that we can minimize their sum by using between a quarter and a third of the sample for validation purposes.”

November 13, 2017 Posted by | Books, Statistics | Leave a comment