# Chi-square test
1. Can be used to test for independence between two variables (X and Y) that are both categorical/factors
2. X & Y can have 2 or more levels but the groups formed by X are independent i.e one factor should be independent of other factor
#### Assumptions
1. Groups Independent. Observations independent
2. All cells >= 1
3. All expected cells >= 5
1. if this is not met you have use something like [[Fishers Test]] or [[Bootstrap]]
Chi-square test is non-parametric.
However, it uses a theoretical probability distribution (Chi-Square Distribution) like a parametric distribution
and it relies on a large sample size
Autism vs vacc (observed)
| | Yes | No | Total |
| ----- | --- | ------ | ------ |
| Yes | 621 | 440034 | 440655 |
| No | 117 | 96531 | 96648 |
| Total | 738 | 536565 | 537303|
Probability of Autism given vacc : $P(Aut|Vac)=P_1=\frac{621}{440655}=0.00141$
Probability of Autism no vacc = $P(Aut|NoVac)=P_2=\frac{117}{96648}=0.00121$
#### Hypothesis
$H_0$: $P_1-P_2=0$
$H_A$ : $P_1-P_2\ne 0$
What do we expect
| | Yes | No | Total |
| ----- | --------------------------- | --------- | ------ |
| Yes | $P(Vac\ \&\ Aut)*Total$ = 605.25 | 440049.75 | 440655 |
| No | 132.75 | 96515.25 | 96648 |
| Total | 738 | 536565 | 537303 |
$P(Vac\ \&\ Aut)=P(Vac)P(Aut)=\frac{440655}{537303}.\frac{738}{537303}$
#### Chi SquareTest
$Test\_statistic = \chi^2 = \sum_{allcells}{\frac{(obs-exp)^2}{exp}}=2.28$
This is $\chi^2$ distributed. P-value =$P(\chi^2\geq 2.28\ if\ H_0\ is\ true)=0.1309$
For large sample sizes small effects may show up as statistically significant. For small sample sizes inverse is true
Chi-square test does not tell anything about the direction or strength of association. Just tells if there is an association at all.
For that you need [[#Risk Difference]] or [[#Relative Risk]] or [[#Odds Ratio]]
# Measures of association for 2x2 tables
Sample table - Exposed vs Disease
| | Yes $D$ | No $D_0$ | Total |
| ------ | --- | --- | ----- |
| Yes $E$ | 30 | 70 | 100 |
| No $E_0$ | 20 | 80 | 100 |
| Total | 50 | 150 | 200 |
$P(D|E)=0.3$
$P(D|E_0)=0.2$
## Risk Difference (RD)
Also called Attributable risk (AR)
=$P(D|E)-P(D|E_0)=10\%$
1/Risk Difference = Number needed to be treated
You are 10% more likely to get a disease after exposure
## Relative Risk (RR)
=$\frac{P(D|E)}{P(D|E_0)}=1.5$
you are 50% more likely to get disease when exposed vs not exposed
## Odds Ratio (OR)
Odds of getting a disease given someone is exposed over odds of getting a disease when not exposed
$\frac{Odds(D|E)}{Odds(D|E_0}=\frac{P(D|E)/P(D_0|E)}{P(D|E_0)/P(D_0|E_0)}= \frac{0.3/0.7}{0.2/0.8}=1.71$
For rare diseases OR approx= RR