Stochastic Process - a process that produces results that vary by chance
Terminology
1. outcomes - mutually exclusive and exhaustive list of possible results in a model
2. events - sets containing zero or more outcomes, we define events we are interested in
3. sample space - set of all possible outcomes (must be collectively exhaustive and mutually exclusive).
4. event space - All sample space and outside sample space, All events and outside events
5. Probability space A probability space is a combination of a sample space, event space, and probability function. The probability function is a real-valued function mapping events to the interval [0,1]
## Axioms of Probability (Kolmogorov Axioms)
1. the probability of an event is a real number on the interval [0,1]
$0 \le P(E) \le 1$
2. the probability of at least one event occurring is 1 $P(S) = 1$ where S is the sample space
3. countable mutually exclusive sets of events satisfy the following $P(\bigcup_{i=1}^\infty E_i) = \sum_{i=1}^\infty P(E_i)$
# Addition Rule
https://www.varsitytutors.com/hotmath/hotmath_help/topics/addition-rule-of-probability
$P(A\ or\ B) = P(A\cup B)= P(A) + P(B) - P(A\cap B)$
## Multiplicative Rule
https://www.varsitytutors.com/hotmath/hotmath_help/topics/multiplication-rule-of-probability
$P(A\ and\ B)= P(A,B) = P(A\cap B) = P(A)\cdot P(B|A) = P(A)\cdot P(B)(if independent)$
# Conditional Rule
$P(A|B) = \frac {P(A,B)} {P(B)}$
If you knew the population distribution and if you sample X items from it what is the probability that the mean would be something
[[Probability Distributions]]
## Discrete Random Variables
discrete random variables are characterized by countable outcome spaces. Discrete random variables are associated with a **probability mass function (pmf)** whose range is a countable subset of $\mathbb{R}$ with probability values in the range [0,1]. Properties of the pmf include:
1. $f_x(x) \ge 0$, for all $x \in \mathbb{R}_X$. i.e probabiilty of an event is >= 0
2. $\sum_{x \in X} f_x(x) = 1$ sum of probabilities is one
3. $F_X(b) - F_X(a) = \sum_{x=a}^b f(x), a < b, a, b \in \mathbb{R}$ This is CDF
CDF:
$F_X(b) = \sum_{x \in [-\infty,b]} f_X(x)$
# Continuous random variables
outcome space is continuous. You have a **Probability density function (pdf)**
# Moment, Expectation, etc
We
## Expectation
The expectation of a function is the average value of the function under a probability distribution. In the case of discrete distributions, this is computed as the weighted average where the weights are dictated by the probability at the value of x (where p(x) is the pmf/pdf):
$E[f] = \sum_x f(x)^r p(x)$
For continuous distributions, this looks like
$E[f] = \int f(x)^r p(x) dx.$
## Mean
if $(f(x) = x$) and$(r=1$) in both cases, this is called the mean of the distribution. In the figure of the normal distribution above, computing the expectation will give \\(E[x]=0\\) which matches our intuition of where the bulk of the mass is centered based on the figure
1st central moment - Mean
2nd central moment - Variance
3rd central moment - Skewness
4th central moment - Kurtosis
## Independence
It can be shown that if X and Y are independent, there exists some functions g(x) and h(y) such that:
$f(x,y) = g(x)h(y)\text{ for all (x,y)}$
How do we use this? In the discrete case, if we can find a pair (x,y) that violate the product rule, the random variables are dependent.
## Covariance
Independent variables have covariance of 0 but the corollary is not true
## Correlation
# Point Estimation
Point estimation is a type of statistical inference which consists in producing a guess or approximation of an unknown parameter.
Given a representative sample of data from some population how dow we estimate the parameters of the distribution?
1. Method of Moments
2. [[Maximum likelihood estimation]]
3. Maximum a posteriori probability estimate
Properties fo an estimator
- Consistency - as N gets large the estimator converged
- Bias - estimator is unbiased if centered
- Efficiency - lowest possible variance
## Method of Moment
Its like taking a mean of the samples and saying the estimate for population mean is sample mean
Method of moments can be shown to be consistent but not necessarily efficient and can give estimates outside the parameter space.
## Maximum Likelihood estimation
follows from an assumption that our data results from independent and identically distributed observations from a population. Our goal is to find a 𝜃 that maximizes the likelihood of us observing our data.
It is joint probability of observing the data given a specific value for parameter.
sometimes it is necessary to convert likelihood to log likelihood for computational simplicity
MLE can be shown to be a consistent estimator, but may be biased.(eg. variances of normal distribution) Operationally, it can be computationally expensive to calculate, but offers a useful fact that any function of the parameters is also a function of the MLE, ie invariant to transformations.
For a normal distribution MLE and MoM give same resutls
https://www.statlect.com/fundamentals-of-statistics/normal-distribution-maximum-likelihood
![[Pasted image 20220130194405.png]]
## Maximum a posteriori estimate
An augmented MLE using prior. similar procedure to MLE
## Z-score
$Z=\frac{observation-\mu}{\sigma}$
## Central Limit Theorem
- If
- we sample individuals from a population
- and we take a large sample size (>20)
- the more symmetric the population distribution the smaller the sample size can be. the more skewed the population more samples are neeeded
- or distribution of the individuals is approximately normal
- then
- the sampling distribution will be approximately normal
## Sampling Distribution
- It is the probability distribution of mean obtained from several sampling from a population
- Theoretical set all possible $\bar{X}$ we could get
### Standard error of mean
Also called standard deviation of mean
$SE\bar{X}=\frac{\sigma}{\sqrt{n}}$
where n is the number of samples
Sample mean varies according to normal distribution
If you know max-min possible values and if you know that the population is normally distributed then you can estimate standard deviation
sd = (max - min)/ 6
## Confidence Interval
True mean of population will be +/- 2$\sigma$ from the sample mean 95% of the time where $\sigma$ is population standard deviation. Because we only have the sample's std devitiaon
$\mu=\bar{X}\pm\\t\frac{s}{\sqrt{n^2}}$
$t$ - t-value
## t-distribution
Use t-distribution instead of Z-distribution (standard normal) because
- we dont know true population std dev. but we know sample std. dev.
- invented by william gosset when he was working in Guinness breweries
- As sample size becomes larger and larger t-distribution converges to standard distribution
- as the sample size becomes smaller the tail of the distribution becomes fatter compared to standard distribution
## Bootstrap
Why? - what if you do not have a large population and cannot assume sampling distribution is normal
It is difficult to work out the Standard error of the estimate
![[Pasted image 20211210223535.png]]
Bootstrap is resampling with replacement from a sample
Distribution of $\bar{X^*}$ is bootstramp sample distribution
SD of all $\bar{X^*}$ is bootstrap standard error
Bootstrap (atleast 1000 or 10,000)
Increasing B cannot increase the amount of information analysis is still based on n observation
- gives a more reliable estimate of standard error
Results from a bootstrap approach is almost identical to a large sample theory. If assumptions arent met for large sample theory sample mean might not work but bootstrap might work