Pearson's Chi-Square Test of Independence

# Chi-square test 1. Can be used to test for independence between two variables (X and Y) that are both categorical/factors 2. X & Y can have 2 or more levels but the groups formed by X are independent i.e one factor should be independent of other factor #### Assumptions 1. Groups Independent. Observations independent 2. All cells >= 1 3. All expected cells >= 5 1. if this is not met you have use something like [[Fishers Test]] or [[Bootstrap]] Chi-square test is non-parametric. However, it uses a theoretical probability distribution (Chi-Square Distribution) like a parametric distribution and it relies on a large sample size Autism vs vacc (observed) | | Yes | No | Total | | ----- | --- | ------ | ------ | | Yes | 621 | 440034 | 440655 | | No | 117 | 96531 | 96648 | | Total | 738 | 536565 | 537303| Probability of Autism given vacc : $P(Aut|Vac)=P_1=\frac{621}{440655}=0.00141$ Probability of Autism no vacc = $P(Aut|NoVac)=P_2=\frac{117}{96648}=0.00121$ #### Hypothesis $H_0$: $P_1-P_2=0$ $H_A$ : $P_1-P_2\ne 0$ What do we expect | | Yes | No | Total | | ----- | --------------------------- | --------- | ------ | | Yes | $P(Vac\ \&\ Aut)*Total$ = 605.25 | 440049.75 | 440655 | | No | 132.75 | 96515.25 | 96648 | | Total | 738 | 536565 | 537303 | $P(Vac\ \&\ Aut)=P(Vac)P(Aut)=\frac{440655}{537303}.\frac{738}{537303}$ #### Chi SquareTest $Test\_statistic = \chi^2 = \sum_{allcells}{\frac{(obs-exp)^2}{exp}}=2.28$ This is $\chi^2$ distributed. P-value =$P(\chi^2\geq 2.28\ if\ H_0\ is\ true)=0.1309$ For large sample sizes small effects may show up as statistically significant. For small sample sizes inverse is true Chi-square test does not tell anything about the direction or strength of association. Just tells if there is an association at all. For that you need [[#Risk Difference]] or [[#Relative Risk]] or [[#Odds Ratio]] # Measures of association for 2x2 tables Sample table - Exposed vs Disease | | Yes $D$ | No $D_0$ | Total | | ------ | --- | --- | ----- | | Yes $E$ | 30 | 70 | 100 | | No $E_0$ | 20 | 80 | 100 | | Total | 50 | 150 | 200 | $P(D|E)=0.3$ $P(D|E_0)=0.2$ ## Risk Difference (RD) Also called Attributable risk (AR) =$P(D|E)-P(D|E_0)=10\%$ 1/Risk Difference = Number needed to be treated You are 10% more likely to get a disease after exposure ## Relative Risk (RR) =$\frac{P(D|E)}{P(D|E_0)}=1.5$ you are 50% more likely to get disease when exposed vs not exposed ## Odds Ratio (OR) Odds of getting a disease given someone is exposed over odds of getting a disease when not exposed $\frac{Odds(D|E)}{Odds(D|E_0}=\frac{P(D|E)/P(D_0|E)}{P(D|E_0)/P(D_0|E_0)}= \frac{0.3/0.7}{0.2/0.8}=1.71$ For rare diseases OR approx= RR