ANOVA - Vivek's Digital Garden

### 0.1. One-way ANOVA 3+ factors -> numerical output ![[Pasted image 20211213162340.png]] #### 0.1.1. Hypothesis $H_0$: $\mu_A=\mu_B=\mu_C=\mu_D$ $H_A$ : $\mu_p\ne\mu_q$ where $p\ne q$ At least one of them is not equal Assumptions - Independent observations - indpendent groups - Standard deviation of each group ~ same - each group has a large n (>20) and distribution ~ normal Oneway Anova is an extension of 2 sample t-test Explained variability - variability explained by the factor that is studied (X) Unexplained variability - variability not explained by factor under study (explained by other factors/random) #### 0.1.2. Notation Sum Squares Total 1. $SST = \sum{(indiv-overall \bar{Y})^2}$ 2. Total Variance = $\frac{SST}{n-1}$ 3. $SST=SS_{EXPL}+SS_{UNEXPL}$ ![[Pasted image 20220101170655.png]] Explained variability is the variability **between** categories (diet A mean vs diet B) Unexplained variability is the variability **within** category i.e among observations in the same category (eg. obs. 1 vs obs 2 in diet A) ![[Pasted image 20211215224609.png]] #### 0.1.3. NOTATION $i$ - groups $(1,2,3,...k)$ $j$ - observation within a group $\bar{Y_i}$ mean of group $i$ ; $\bar{Y}$ overall mean $S_i$ SD of gorup $i$ $Y_{ij}$ individual observation $k$ number of groups $n_i$ sample size per group Explained variability -> $S_B^2\ or\ MS_B=\frac{SS_{btwn}}{DF_{btwn}}=\sum_{group}\frac{n_i(\bar{Y_i}-\bar{Y})^2}{k-1}$ Unexplained variability -> $S_W^2(MS_W)=\frac{SS_{within}}{DF_{within}}=\sum_{obs}\frac{(\bar{Y_{ij}}-\bar{Y_i})^2}{n-k}=\sum_{group}\frac{(n_i-1)S_i^2}{n-k}$ The later is the same formula as pooled variance in 2-sample t-test with equal variance. group variance weighted #### 0.1.4. Hypothesis Testing If $H_0$ is TRUE: Expect that $MS_B\approx MS_W$ i.e $F_{STAT} = \frac{MS_B}{MS_W}\approx 1$ If $H_A$ is TRUE: Expect that $MS_B>MS_W$ i.e $F_{STAT} = \frac{MS_B}{MS_W}>1$ F-statistic is the ratio of mean square (aka variance) between group to mean square within group. If f-stat is large then variance betwen groups are big so groups are unlike each other F-statistic follow f-distribution and uses both degrees fo freedom for numerator and denominator $df_n=k-1$ $df_d=n-k$ #### 0.1.5. Interpretation 1. p-value - as usual interpretation (probability of getting a given f-stat given null hypothesis is true) Compare all possible pairwise means to tell which of the means are different #### 0.1.6. Multiple comparison When we do more than one comparison, our type I error rate starts increasing with every additional test we do.i.e it compounds All pairwise comparison - AB, AC,AD, BC, BD, CD $c {4 \choose 2}=\frac{4!}{2!2!}=6$ do independent 2-sample t-test for each pair If we use $\alpha=0.05$ then the probability of not making an type-1 error is 0.95. Then the probability of not making type1 error in 6 comparisons = $(0.95)^6$ Hence we can use Bonfronni's approach to reduce compounding of error (there are other methods) Use $\alpha^*=\frac{0.05}{no.of comparisons}$ So $\alpha^*=\frac{0.05}{6}=0.00833$ Probability of not making type 1 error in one comparison $1-\alpha^* =0.99167$ Probability of not making type 1 error in 6 comparisons = $(0.99167)^6= 0.951$ in the example he worked out two comparisons did not contain zero so they are different ![[Pasted image 20220102184428.png]] A and C are different B and C are different