**Unit 4:**Analytic Epidemiology**Unit 4 Learning Objectives:**1. Understand hypothesis formulation in epidemiologic studies. 2. Understand and calculate measures of effect (risk difference, risk ratio, rate ratio, odds ratio) used to evaluate epidemiologic hypotheses. 3. Understand statistical parameters used to evaluate epidemiologic hypotheses and results: --- P-values --- Confidence intervals --- Type I and Type II error --- Power**Unit 4 Learning Objectives (cont.):**• 4. Recognize the primary study designs used to evaluate epidemiologic hypotheses: • --- Randomized trial • --- Prospective & retrospective cohort studies • --- Case-control study • --- Case-crossover study • --- Cross-sectional study**Assigned Readings:**Textbook (Gordis): Chapter 11 Rothman: Random error and the role of statistics. In Epidemiology: an Introduction, Chapter 6, pages 113-129.**Analytic Epidemiology**Study of the DETERMINANTS of health-related events**Hypothesis Formulation**Scientific Method (not unique to epi) --- Formulate a hypothesis --- Test the hypothesis**Basic Strategy of Analytical Epi**1. Identify variables you are interested in: • Exposure • Outcome 2. Formulate a hypothesis 3. Compare the experience of two groups of subjects with respect to the exposure and outcome**Basic Strategy of Analytical Epi**Note: Assembling the study groups to compare, whether on the basis of exposure or disease status, is one of the most important elements of study design. Ideally, we would like to know what happened to exposed individuals had they not been exposed, but this is “counterfactual” since, by definition, such individuals were exposed.**Hypothesis Formulation**The “Biostatistican’s” way H0: “Null” hypothesis (assumed) H1: “Alternative” hypothesis The “Epidemiologist’s” way Direct risk estimate (e.g. best estimate of risk of disease associated with the exposure).**Hypothesis Formulation**Biostatistican: H0: There is no association between the exposure and disease of interest H1: There is an association between the exposure and disease of interest (beyond what might be expected from random error alone)**Hypothesis Formulation**Epidemiologist: What is the best estimate of the risk of disease in those who are exposed compared to those who are unexposed (i.e. exposed are at XX times higher risk of disease). This moves away from the simple dichotomy of yes or no for an exposure/disease association – to the estimated magnitude of effect irrespective of whether it differs from the null hypothesis.**Hypothesis Formulation**“Association” Statistical dependence between two variables: • Exposure(risk factor, protective factor, predictor variable, treatment) • Outcome(disease, event)**Hypothesis Formulation**“Association” The degree to which the rate of disease in persons with a specific exposure is either higher or lower than the rate of disease among those without that exposure.**Hypothesis Formulation**Ways to Express Hypotheses: 1. Suggest possible events… The incidence of tuberculosis will increase in the next decade.**Hypothesis Formulation**Ways to Express Hypotheses: 2. Suggest relationship between specific exposure and health-related event… A high cholesterol intake is associated with the development (risk) of coronary heart disease.**Hypothesis Formulation**Ways to Express Hypotheses: 3. Suggest cause-effect relationship…. Cigarette smoking is a cause of lung cancer**Hypothesis Formulation**Ways to Express Hypotheses: 4. “One-sided” vs. “Two-sided” One-sided example: Helicobacter pylori infection is associated with increased risk of stomach ulcer Two-sided example: Weight-lifting is associated with risk of lower back injury**Hypothesis Formulation**• Guidelines for Developing Hypotheses: • State the exposure to be measured as • specifically as possible. • State the health outcome as • specifically as possible. • Strive to explain the smallest amount • of ignorance**Hypothesis Formulation**Example Hypotheses: POOR Eating junk food is associated with the development of cancer. GOOD The human papilloma virus (HPV) subtype 16 is associated with the development of cervical cancer.**“Measures of Effect”**• Used to evaluate the research hypotheses • Reflects the disease experience of • groups of persons with and without the • exposure of interest • Often referred to as a “point estimate” • (best estimate of exposure/disease • relationship between the two groups)**“Measures of Effect”**• Risk Difference (RD) • Relative Risk (RR) --- Risk Ratio (RR) --- Rate Ratio (RR) • Odds Ratio (OR)**“Measures of Effect”**• Risk Difference (RD) The absolute difference in the incidence (risk) of disease between the exposed group and the non-exposed (“reference”) group**“Risk Difference”**Hypothesis:Asbestos exposure is associated with mesothelioma Results: Of 100 persons with high asbestos exposure, 14 develop mesothelioma over 10 years Of 200 persons with low/no asbestos exposure, 12 develop mesothelioma over 10 years**“Risk Difference”**Hypothesis:Asbestos exposure is associated with mesothelioma Results: Of 100 persons with high asbestos exposure, 14 develop mesothelioma over 10 years Of 200 persons with low/no asbestos exposure, 12 develop mesothelioma over 10 years**“Risk Difference”**Hypothesis:Asbestos exposure is associated with mesothelioma Results: Of 100 persons with high asbestos exposure, 14 develop mesothelioma over 10 years Of 200 persons with low/no asbestos exposure, 12 develop mesothelioma over 10 years RD = IE+ – IE- RD = (14 / 100) – (12 / 200) RD = 0.14 – 0.06 = 0.08 The absolute 10-year risk of mesothelioma is 8% higher in persons with asbestos exposure compared to persons with low or no exposure to asbestos.**“Measures of Effect”**• Risk Ratio • Rate Ratio Compares the incidence of disease (risk) among the exposed with the incidence of disease (risk) among the non-exposed (“reference”) by means of a ratio. The reference group assumes a value of 1.0 (the “null” value) {“Relative Risk (RR)”}**The ‘null’ value (1.0)**CIexposed = 0.0026 CInon-exposed = 0.0026 CIexposed = 0.49 CInon-exposed = 0.49 IRexposed = 0.062 per 100K IRnon-exposed = 0.062 per 100K RR = 1.0 RR = 1.0 RR = 1.0**The ‘null’ value (1.0)**• If the relative risk estimate is > 1.0, the exposure appears to be a risk factor for disease. • If the relative risk estimate is < 1.0, the exposure appears to be protective of disease occurrence.**“Risk Ratio”**Hypothesis: Being subject to physical abuse in childhood is associated with lifetime risk of attempted suicide Results: Of 2,240 children not subject to physical abuse, 16 have attempted suicide. Of 840 children subjected to physical abuse, 10 have attempted suicide. Note that the row and column headings have been arbitrarily switched from the prior example.**“Risk Ratio”**Hypothesis: Being subject to physical abuse in childhood is associated with lifetime risk of attempted suicide Results: Of 2,240 children not subject to physical abuse, 16 have attempted suicide. Of 840 children subjected to physical abuse, 10 have attempted suicide.**“Risk Ratio”**Hypothesis: Being subject to physical abuse in childhood is associated with lifetime risk of attempted suicide Results: Of 2,240 children not subject to physical abuse, 16 have attempted suicide. Of 840 children subjected to physical abuse, 10 have attempted suicide. RR = IE+ / IE- RR = (10 / 840) / (16 / 2,240) RR = 0.0119 / 0.0071 = 1.68**“Risk Ratio”**RR = IE+ / IE- = 1.68 Children with a history of physical abuse are approximately 1.7 times more likely to attempt suicide in their lifetime compared to children without a history of physical abuse. The risk of lifetime attempted suicide is approximately 70% higher in children with a history of physical abuse compared to children without a history of physical abuse.**“Rate Ratio”**Hypothesis: Average daily fiber intake is associated with risk of colon cancer Results:Of 112 adults with high fiber intake followed for 840 person yrs, 9 developed colon cancer. Of 130 adults with moderate fiber intake followed for 900 person yrs, 14 developed colon cancer Of 55 adults with low fiber intake followed for 450 person yrs, 12 developed colon cancer.**“Rate Ratio”**• Assume that high fiber intake is the reference group (value of 1.0) • Compare the incidence rate (IR) of colon cancer: Moderate fiber intake versus high fiber intake Low fiber intake versus high fiber intake**“Rate Ratio”**RR = Imoderate / Ihigh = 1.46 RR = Ilow / Ihigh = 2.50 Persons with moderate fiber intake are at 1.46 times higher risk of developing colon cancer than persons with high fiber intake. Persons with low fiber intake are at 2.50 times higher risk of developing colon cancer than persons with high fiber intake.**“Measures of Effect”**• Odds Ratio (OR) Compares the odds of exposure among those with disease to the odds of exposure among those without the disease. Does not compare the incidence of disease between groups.**“Odds Ratio”**Hypothesis:Eating chili peppers is associated with development of gastric cancer. Cases: 21 12 ate chili peppers 9 did not eat chili peppers Controls: 479 88 ate chili peppers 391 did not eat chili peppers**“Odds Ratio”**Hypothesis: Eating chili peppers is associated with development of gastric cancer. Cases: 21 12 ate chili peppers 9 did not eat chili peppers Controls: 479 88 ate chili peppers 391 did not eat chili peppers OR = (a / c) / (b / d) OR = (12 / 9) / (88 / 391) OR = 1.333 / 0.225 = 5.92 OR = (ad) / (bc)**“Odds Ratio”**OR = 5.92 • The odds of being exposed to chili peppers are 5.92 times higher for gastric cancer cases as compared to controls • (Interpreting OR as RR – if appropriate) The incidence (or risk) of gastric cancer is 5.92 times higher for persons who eat chili peppers as compared with persons who do not eat chili peppers (Is this appropriate?)**Odds Ratio & Risk Ratio**Relationship between RR and OR: The odds ratio will provide a good estimate of the risk ratio when: 1. The outcome (disease) is rare OR 2. The effect size is small or modest**Odds Ratio & Risk Ratio**• The odds ratio will provide a good estimate of the • risk ratio when: • The outcome (disease) is rare a / (a +b ) RR = ------------ c / (c +d) If the disease is rare, then cells (a) and (c) will be small OR = (a / c) / (b / d) a / (a +b ) a / b ad RR = ------------ = ------ =-- = OR c / (c +d) c / d bc OR = (ad) / (bc)**Odds Ratio & Risk Ratio**The odds ratio will provide a good estimate of the risk ratio when: 2. The effect size is small or modest. (40 / 120) 0.333 OR = ------------ = ------- = 1.0 (60 / 180) 0.333 40 / (40 + 60) 0.40 RR = -------------------- ------ = 1.0 120 / 120 + 180) 0.40**Odds Ratio & Risk Ratio**Finally, we expect the risk ratio to be closer to the null value of 1.0 than the odds ratio. Therefore, be especially interpreting the odds ratio as a measure of relative risk when the outcome is not rare and the effect size is large. (20 / 10) 2.0 OR = ------------ = ------- = 6.0 (30 / 90) 0.333 (20 / 50) 0.40 RR = ------------ = ------- = 4.0 (10 / 100) 0.10