# Simple Linear Regression
Y- continuous; X- continuous (or catergorical)
how we choose the best line
method of least squares (minimize sum of square of error)
- ends up same as method of "maximum likelihood"
Metrics:
- Pearson's correlation r
- direction & strength of association
- R-squared
- coefficient of determination
- square of pearson correlation
- takes values 0-1
- % of variability in Y that is explained by the model
- $R^2=\frac{SS_{explained}}{SS_{total}}=1-\frac{SS_{error}}{SS_{explained}}$
- Adjusted r-squared =$R^2$ - penalty for # of Xs in the model
- Cross valdiation
Non-linearity
1. transform the y-variable
1. i.e instead of fitting X& Y you can fit X & log(Y) or sqrt(Y), etc.
2. Transform x-variable
1. ladder of transformation
3. Polynomial
4. Categorical X
1. "dummy" or indicator variable
5. Non-linear regression
1. eg. spline