### 0.1. One-way ANOVA
3+ factors -> numerical output
![[Pasted image 20211213162340.png]]
#### 0.1.1. Hypothesis
$H_0$: $\mu_A=\mu_B=\mu_C=\mu_D$
$H_A$ : $\mu_p\ne\mu_q$ where $p\ne q$ At least one of them is not equal
Assumptions
- Independent observations
- indpendent groups
- Standard deviation of each group ~ same
- each group has a large n (>20) and distribution ~ normal
Oneway Anova is an extension of 2 sample t-test
Explained variability - variability explained by the factor that is studied (X)
Unexplained variability - variability not explained by factor under study (explained by other factors/random)
#### 0.1.2. Notation
Sum Squares Total
1. $SST = \sum{(indiv-overall \bar{Y})^2}$
2. Total Variance = $\frac{SST}{n-1}$
3. $SST=SS_{EXPL}+SS_{UNEXPL}$
![[Pasted image 20220101170655.png]]
Explained variability is the variability **between** categories (diet A mean vs diet B)
Unexplained variability is the variability **within** category i.e among observations in the same category (eg. obs. 1 vs obs 2 in diet A)
![[Pasted image 20211215224609.png]]
#### 0.1.3. NOTATION
$i$ - groups $(1,2,3,...k)$ $j$ - observation within a group
$\bar{Y_i}$ mean of group $i$ ; $\bar{Y}$ overall mean
$S_i$ SD of gorup $i$
$Y_{ij}$ individual observation
$k$ number of groups
$n_i$ sample size per group
Explained variability -> $S_B^2\ or\ MS_B=\frac{SS_{btwn}}{DF_{btwn}}=\sum_{group}\frac{n_i(\bar{Y_i}-\bar{Y})^2}{k-1}$
Unexplained variability -> $S_W^2(MS_W)=\frac{SS_{within}}{DF_{within}}=\sum_{obs}\frac{(\bar{Y_{ij}}-\bar{Y_i})^2}{n-k}=\sum_{group}\frac{(n_i-1)S_i^2}{n-k}$
The later is the same formula as pooled variance in 2-sample t-test with equal variance. group variance weighted
#### 0.1.4. Hypothesis Testing
If $H_0$ is TRUE: Expect that $MS_B\approx MS_W$ i.e $F_{STAT} = \frac{MS_B}{MS_W}\approx 1$
If $H_A$ is TRUE: Expect that $MS_B>MS_W$ i.e $F_{STAT} = \frac{MS_B}{MS_W}>1$
F-statistic is the ratio of mean square (aka variance) between group to mean square within group. If f-stat is large then variance betwen groups are big so groups are unlike each other
F-statistic follow f-distribution and uses both degrees fo freedom for numerator and denominator
$df_n=k-1$
$df_d=n-k$
#### 0.1.5. Interpretation
1. p-value - as usual interpretation (probability of getting a given f-stat given null hypothesis is true)
Compare all possible pairwise means to tell which of the means are different
#### 0.1.6. Multiple comparison
When we do more than one comparison, our type I error rate starts increasing with every additional test we do.i.e it compounds
All pairwise comparison - AB, AC,AD, BC, BD, CD
$c {4 \choose 2}=\frac{4!}{2!2!}=6$
do independent 2-sample t-test for each pair
If we use $\alpha=0.05$ then the probability of not making an type-1 error is 0.95. Then the probability of not making type1 error in 6 comparisons = $(0.95)^6$
Hence we can use Bonfronni's approach to reduce compounding of error (there are other methods)
Use $\alpha^*=\frac{0.05}{no.of comparisons}$
So $\alpha^*=\frac{0.05}{6}=0.00833$
Probability of not making type 1 error in one comparison $1-\alpha^* =0.99167$
Probability of not making type 1 error in 6 comparisons = $(0.99167)^6= 0.951$
in the example he worked out two comparisons did not contain zero so they are different
![[Pasted image 20220102184428.png]]
A and C are different B and C are different