Κατέβασμα παρουσίασης
Η παρουσίαση φορτώνεται. Παρακαλείστε να περιμένετε
1
Στατιστικές μέθοδοι στην Επιδημιολογία (Statistical Methods in Epidemiology) Ακαδ. Έτος 2018-2019
ΜΠΑΜΙΑ ΧΡΙΣΤΙΝΑ* (Υπεύθυνη) ΒΟΥΡΛΗ ΓΕΩΡΓΙΑ* ΚΑΛΠΟΥΡΤΖΗ ΝΑΤΑΣΑ* (Ασκήσεις) *Ιατρική Σχολή Πανεπιστημίου Αθηνών Εργαστήριο Υγιεινής, Επιδημιολογίας & Ιατρικής Στατιστικής
2
Poisson regression
3
Models for counts: The Poisson probability density function is given by The mean and variance of the Poisson distribution is Poisson regression models allow researchers to examine the relationship between predictors and count outcome variables.
4
Examples The Poisson distribution arises in many biological and medical contexts where counts are involved: The number of bacterial colonies in a dish The number of trees in an area of land The number of children an individual has The number of nucleotide base substitutions in a gene over a period of time The number of deaths in a group of patients over a study period
5
The model for counts ln(i) = 0 + 1 xi1 + 2xi2 + …+ pxip ,
We need the logarithm, since 0 + 1 xi1 + 2xi2 + …+ pxip can take any real value, but we are modelling counts (i), so we want it to be >0.
6
Example Respiratory deaths were counted in Athens between 1988 and 1991 We have 4 variables: Cases: the number of deaths Pop: the population of each age group Age: the categorical age group; one of 40 − 54, 55 − 59, 60 − 64, 65 − 74 or > 74 Questions of interest: How does the expected number of deaths vary by age? ln(i) = 0 + 1 I40−54 + 2I 3I 4I65-74
7
What about modelling rates?
If all patients are assumed to be followed-up for the same time interval, a rate is unnecessary But if there is variation in the time each individual is followed-up, modelling the count of deaths would be misleading.
8
Models for rates Example
The British doctors study: The classical cohort study by Doll et al which was used (among other objectives) to investigate the effect of smoking on coronary heart disease (CHD) among male British doctors. agegrp smoke deaths Pyrs 1 32 52407 2 104 43248 3 206 28612 4 186 12663 5 102 5317 18790 12 10673 28 5710 2585 31 1462 With age groups being 1: 35-44, 2: 45-54, 3: 55-64, 4: 65-74, 5: 75+ The crude CHD death rate for the non-smokers is 101/39220 = and for the smokers 630/ = Therefore the rate ratio of non smokers compared to smokers is 0.6, i.e., non-smokers have a reduced rate of CHD deaths.
9
Models for rates Imagine events which occur independently in time intervals ti with rates i , Yi random variables denoting the numbers of events in the corresponding ti and they have Poisson distributions with means i= i ti Poisson models handle exposure variables by using simple algebra to change the dependent variable from a rate into a count.
10
Models for rates If the rate is count/exposure, multiplying both sides of the equation by exposure moves it to the right side of the equation. The Poisson model is a regression model of the mean i on p explanatory variables xi1, xi2, …, xip , where the link function is the log function. The model we are interested in is a model for the rates i , i.e., ln(i) = 0 + 1 xi1 + 2xi2 + …+ pxip , and because i= i ti . ln(i) – ln(ti ) = 0 + 1 xi1 + 2xi2 + …+ pxip , thus ln(i) = ln(ti ) + 0 + 1 xi1 + 2xi2 + …+ pxip , The Poisson model includes the offset term –ln(ti)- representing the log-person years which has a coefficient equal to one. Using the offset is just a way of accounting for population sizes, which could vary by time. The coefficients 1 … p represent the effect of each of the xi1, xi2 , …, xip on the log of rates i
11
MPH Program, Biostatistics II, April 30, 2010, W.D. Dupont
Assumptions for Poisson regression: The distribution of deaths in each time interval will be well approximated by a Poisson distribution if the following is true Only one event can occur in each interval Low event rates: The proportion of patients who have the event/disease of interest in each risk group should be small. The rate parameter λ is the same across all intervals The time intervals are independent, i.e the probability of observing an event in an interval l does not depend on whether we observed event(s) in any other interval Of note, the denominators of rates used in Poisson regressions is often patient-years rather than patients. In fact, it depends on what rate we want to estimate (see examples below) 7: Introduction to Poisson regression
12
Poisson versus Survival analysis models
Poisson regression is a very useful tool when we need to estimate rates i.e. for analyzing cohort studies If we have detailed individual-level data (accurate data on the follow-up time for each of the cohort participants) , we can apply the more sophisticated approaches that have been developed in the field of survival analysis
13
4. Examples: What we want to investigate are:
Define Si = 1 xi1 where xi1 is an indicator with 1 denoting that group i consists of smokers and 0 otherwise. Define Ai = 2xi2 + 3xi3 + 4xi4 +5xi5 , where xij , j=2,…,5 are indicators with 1 if group i is the age class j and 0 otherwise. Now j represent the effect of the age groups on the log rate of CHD deaths. What we want to investigate are: the effect of smoking on CHD rate, and the effect of smoking on CHD rate, having adjusted for age.
14
Model 1: smoking ln(μi) = ln (ti) + β0 + Si
Fitting this model through Stata we have the following: xi : poisson deaths i.smokes, e (pyrs) i. smokes Ismoke_0-1 (naturally coded; Ismoke_0 omitted) Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Poisson regression Number of obs = 10 LR chi2(1) = Prob > chi2 = Log likelihood = Pseudo R2 = deaths Coef. Std.Err. z P> [95% Conf Interval] Ismoke_1 5.059 0.000 _cons pyrs (exposure) The interpretation of the parameter estimates is as follows. : the estimated log rate for non-smokers. : the estimated difference in log rates between non-smokers and smokers. : the estimated crude rate ratio between non-smokers and smokers.
15
Model 2: adjusting for age
ln(μi) = ln (ti) + Si + Ai Fitting this model through Stata we have the following: xi : poisson deaths i. smokes i. agegrp, e (pyrs) deaths Coef. Std. Err. z P> [95% Conf Interval] Ismoke_1 3.302 0.001 Iagegr _ 2 7.606 0.000 Iagegr_ 3 14.301 Iagegr_ 4 18.130 Iagegr_ 5 19.249 _cons pyrs (exposure) The age-adjusted rate ratio, comparing smokers with non-smokers, is: e = (similar to M-H=1.39), with 95% CI (e X , e X ) = (1.16, 1.76). Assumes effects of smoke and age combine multiplicatively (i.e. there is no significant interaction between them)
16
5. Testing hypotheses in Poisson regression
1. Wald test (given directly in STATA output) 2. the likelihood ratio test. Testing for effect modification Poisson model in its simple form assumes no interaction between explanatory variables But we can check whether the effect of the exposure (e.g. smoke) differs according to the levels of the potential confounder (e.g. age) Using the LR test and the nested models
17
The model in full The model in full In multiplicative form
Exposure (X) Stratum (Z=0,1,2) 1 λc λc θ λc φ1 λc θ φ1 2 λc φ2 λc θ φ2 Exposure (X) Stratum (Z=0,1,2) 1 a a + β a + γ1 a + γ1 + β 2 a + γ2 a + γ2 + β Log rates of outcome rates of outcome Log (λ) = α + βx + γ1z1 + γ2z2 What is the difference in the log rates in stratum 0 between exposed and unexposed? In stratum 1? What do we assume here?
18
Interactions (effect modification)
Log (λ) = α + βx + γ1z1 + γ2z2 + δ1 (xz1) + δ2 (xz2) Exposure Stratum 1 a a + β a + γ1 a + γ1 + β + δ1 2 a + γ2 a + γ2 + β + δ2 What is the difference in the log rates in stratum 0 between exposed and unexposed? In stratum 1? What do we assume here? In multiplicative form Exposure Stratum 1 λc λc θ λc φ1 λc θ φ1ρ1 2 λc φ2 λc θ φ2ρ2
19
Poisson regression - Stratification
Remember the Whitehall Study with grade of work (exposure) and age (confounder) xi : poisson deaths i. grade*agegrp, e (pyrs) irr Deaths RR SE z p-value 95% CI Test for interaction xi : poisson deaths i. grade i. agegrp, e (pyrs) est store B lrtest A B likelihood-ratio test chi2(5) = 10.43, (Assumption: B nested in A) Prob > chi2 =
20
Quantitative exposure in Poisson regression
If explanatory variable is quantitative (or ordered) can use Poisson model to test for linearity. Stronger assumption than treating as categorical since uses fewer parameters - assumes log rate changes linearly across explanatory variable. Caution: this is strong assumption since very few relationships are exactly log-linear Only one parameter in model - change in log rate per unit of variable. e.g. log λ = b0 + Age
21
Quantitative exposure in Poisson regression (cont.)
LRT compares: Model 1: age categorical -no assumption of how log rate changes Model 2: age quantitative - assumes log rate changes linearly H0: association is log-linear vs. H1:association is not log-linear
22
Quantitative exposure in Poisson regression (cont.)
If exposure is quantitative, interpretation is “change in log rate per unit change”. Imagine age as the exposure – how is it coded? If age categorical is coded as groups of years, e.g, in 5 years groups as 1,2,3,4,…then jump of one unit represents 5-year jump if values of the categories were 40,45,50,55,…, then jump of one unit represents 1-year jump
23
Quantitative exposure in Poisson regression (cont.)
In STATA: e.g age coded: 40, 50, 55, 60, 65, 70, 75, 80 (agegrp) xi : poisson deaths i. grade agegrp, e (pyrs) Deaths rate SE z p % CI Igrade_2 | Agegrp | 1 unit jump = 1 year in age RR increases by for each 1-year increase in age Approx = 1.53 over 5 years
24
Quantitative exposure in Poisson regression (cont.)
If same groups of age were grouped as 1,2,3,4…(agegrp) xi : poisson deaths i. grade agegrp, e (pyrs) Deaths log(rate) SE z p % CI _Igrade_2 | Agegrp | 1 unit jump = 5 years in age RR increases by for each 5-year increase in age
25
Exercise Έστω επιδημιολογική μελέτη κοορτής η οποία στοχεύει στην εκτίμηση της επίδρασης της κατανάλωσης του καφέ στην ανάπτυξη καρκίνου του ήπατος. Έστω: Υi ο χρόνος παρακολούθησης για κάθε άτομο i της κοορτής λi ο ρυθμός ανάπτυξης του καρκίνου του ήπατος για το άτομο i. Θεωρείστε επίσης πιθανό συχγυτικό παράγοντα την κατανάλωση αλκοόλ .
26
Exercise Εάν η ανάλυση πραγματοποιηθεί με το παρακάτω μοντέλο Poisson:
Log (λ) = α + βX + γ1Z1 + γ2Z2 + δ1 (XZ1) + δ2 (XZ2), όπου: Χ= κατανάλωση καφέ κατά την είσοδο στην μελέτη: 0 = < 1 φλυτζάνι την ημέρα; 1 = > 1 φλυτζάνι την ημέρα, Ζ= ημ/σια κατανάλωση αλκοόλ κατά την είσοδο στην μελέτη: 0=χαμηλή; 1=μέτρια; 2=υψηλή, και Z1 , Z2 μεταβλητές-δείκτες (dummy variables) για τον παράγοντα «κατανάλωση αλκοόλ», με: Z1 (1=μέτρια ημ/σια κατανάλωση αλκοόλ, 0 = άλλο), Z2 (1=υψηλή ημ/σια κατανάλωση αλκοόλ, 0 = άλλο) Δώστε την εκτίμηση του ρυθμού ανάπτυξης καρκίνου του ήπατος σε κάθε μια από τις κατηγορίες του παράγοντα κατανάλωση αλκοόλ Τι υποθέτει το παραπάνω μοντέλο για τον ρόλο της κατανάλωσης αλκοόλ όσον αφορά την επίδρασή του στην εκτίμηση του σχετικού κινδύνου για ανάπτυξη καρκίνου του ήπατος σε σχέση με την κατανάλωση καφέ;
27
More on the offset As we mentioned, Poisson regression may also be appropriate for rate data, the rate is a count of events divided by some measure of that unit's exposure (a particular unit of observation). Examples: biologists may count the number of tree species in a forest: events would be tree observations, exposure would be unit area, and rate would be the number of species per unit area. Demographers may model death rates in geographic areas as the count of deaths divided by person−years. Event rates can be calculated as events per unit time, which allows the observation window to vary for each unit.
28
More on the offset In these examples, exposure is respectively unit area, person−years and unit time. This is handled as an offset, where the exposure variable enters on the right-hand side of the equation, but with a parameter estimate (for log(exposure)) constrained to 1. Using the offset is a way of accounting for population sizes, which could vary not only by time but with age, region, area etc.
29
More on the offset The fact that an offset variable is required to have a coefficient of 1 allows it to be part of the rate. It allows you to theoretically move it back to the right side of the equation to turn your rate back into a count. by defining an offset variable, we are only adjusting for the amount of opportunity an event has.
30
More on the offset Let’s assume individuals in a rehab center. Including time in the offset, means that we assume that every day in rehab makes a patient equally likely to have an aggressive incident. Each day is simply an opportunity for an incident. A patient in for 20 days is twice as likely to have an incident as a patient in for 10 days. We assume that the likelihood of events is not changing over time (λ is constant for all time intervals). If, for example, it takes patients a few weeks to learn the consequences of aggressive behavior, then stop or lessen their rates, then time is not just a matter of exposure. Likewise, if patients start becoming more agitated after being in a program after a few months, so that the longer residence time is actually creating more aggression, then time is not just a matter of exposure. In either of these cases, number of days in a program would serve better as a predictor than as an exposure variable. As a predictor, the coefficient will be estimated from the data, not set to 1.
31
More on the offset Let’s assume children in the first grade. Including time in the offset, means that we assume that every day in school makes a child equally likely to learn one new word. Each day is simply an opportunity for new word to be learned. A child in the first 20 days is twice as likely to learn a new word as a child in the first 10 days. We assume that the likelihood of learning new words is not changing over time (λ is constant for all time intervals). If, for example, it takes children a few weeks to get used to the new environment, then the number of words learned increases, then time is not just a matter of exposure. Similarly, if children start learning other things (e.g. phrases), so that the number of words learned decreases, then time is not just a matter of exposure. In either of these cases, time in school would serve better as a predictor than as an exposure variable. As a predictor, the coefficient will be estimated from the data, not set to 1.
32
Μελέτη για την πυκνότητα των δασών
Μεταβλητές: Ημερολογιακό έτος, περιοχή, υψόμετρο, αριθμός δέντρων (Ν) και έκταση Ερώτημα: λαμβάνοντας υπόψιν την περιοχή και το υψόμετρο, έχει μειωθεί η πυκνότητα των δασών στην Ελλάδα; Πελοπόννησος Στερεά Ελλάδα Χαμηλό υψόμετρο Μέτριο υψόμετρο Υψηλό υψόμετρο Ημερ/κό έτος Ν Έκταση (km2) 3950 5 4200 4450 7 4700 4950 8 5200 3 2450 2700 2950 3200 3450 2 3700 9 1000 4 1100 1500 1700 1950 2200
33
Example | cyear region altitude N Surface | | | 1. | Peloponisos low | 2. | Peloponisos moderate | 3. | Peloponisos high | 4. | Sterea low | 5. | Sterea moderate | 6. | Sterea high | 7. | Peloponisos low | 8. | Peloponisos moderate | 9. | Peloponisos high | 10. | Sterea low | 11. | Sterea moderate | 12. | Sterea high | 13. | Peloponisos low | 14. | Peloponisos moderate | 15. | Peloponisos high | 16. | Sterea low | 17. | Sterea moderate | 18. | Sterea high |
34
Example- The model Poisson regression Number of obs = 18
. poisson N i.cyear i.region i.altitude, e(Surface) Poisson regression Number of obs = LR chi2(5) = Prob > chi2 = Log likelihood = Pseudo R = N | Coef. Std. Err z P>|z| [95% Conf. Interval] cyear | | | | region | Sterea | altitude | moderate | high | _cons | ln(Surface) | (exposure)
35
Example- Interpretation
19.1% more trees per square km in Sterea compared to Peloponisos (1-e-0.28)*100=24.9% less trees per square km in compared to (1-e-1.07)*100=65.7% less trees per square km in compared to
36
Example-2 Μεταβλητές: Έτη από τη διάγνωση της οστεοπόρωσης, φύλο, σωματική δραστηριότητα (ΣΔ), αριθμός καταγμάτων (n) και αριθμός ατόμων Ερώτημα: λαμβάνοντας υπόψιν την περιοχή και το υψόμετρο, έχει μειωθεί η πυκνότητα των δασών στην Ελλάδα; Άνδρες Γυναίκες Χαμηλή ΣΔ Μέτρια ΣΔ Υψηλή ΣΔ Έτη από τη διάγνωση n Άτομα 1-3.0 25 1700 19 1950 11 2200 30 1000 22 1100 15 1200 208 3200 207 3450 203 3950 196 2450 189 2700 177 2950 541 4700 544 4920 546 5200 513 504 4200 489 4450
37
The dataset | group time gender pactiv~y rate N fractu~s | | | 1. | Woman low | 2. | Woman moderate | 3. | Woman high | 4. | Man low | 5. | Man moderate | 6. | Man high | 7. | Woman low | 8. | Woman moderate | 9. | Woman high | 10. | Man low | 11. | Man moderate | 12. | Man high | 13. | Woman low | 14. | Woman moderate | 15. | Woman high | 16. | Man low | 17. | Man moderate | 18. | Man high |
38
The model Poisson regression Number of obs = 18 LR chi2(5) = 1281.32
. poisson fract i.time i.gender i.pact, e(N) Poisson regression Number of obs = LR chi2(5) = Prob > chi2 = Log likelihood = Pseudo R = fractures | Coef. Std. Err z P>|z| [95% Conf. Interval] time | | | | gender | Man | pactivity | moderate | high | _cons | ln(N) | (exposure)
39
Example 2- Interpretation
4.9 times more bone fractures per individual for people having been diagnosed for 3-5 years compared to those having been diagnosed for 1-3 years e2.16 =8.7 8.7 times more bone fractures per individual for people having been diagnosed for 3-5 years (e )*100=16.4% less bone fractures per individual for those that have high physical activity compared to those that have low physical activity
Παρόμοιες παρουσιάσεις
© 2024 SlidePlayer.gr Inc.
All rights reserved.