Simple Linear Regression - Vivek's Digital Garden

# Simple Linear Regression Y- continuous; X- continuous (or catergorical) how we choose the best line method of least squares (minimize sum of square of error) - ends up same as method of "maximum likelihood" Metrics: - Pearson's correlation r - direction & strength of association - R-squared - coefficient of determination - square of pearson correlation - takes values 0-1 - % of variability in Y that is explained by the model - $R^2=\frac{SS_{explained}}{SS_{total}}=1-\frac{SS_{error}}{SS_{explained}}$ - Adjusted r-squared =$R^2$ - penalty for # of Xs in the model - Cross valdiation Non-linearity 1. transform the y-variable 1. i.e instead of fitting X& Y you can fit X & log(Y) or sqrt(Y), etc. 2. Transform x-variable 1. ladder of transformation 3. Polynomial 4. Categorical X 1. "dummy" or indicator variable 5. Non-linear regression 1. eg. spline