### Gaussian Mixture Model **Reference** * Python Data Science Handbook by Jake VanderPlas * Pattern Recognition and Machine Learning by Christopher M. Bishop Usually the data does not come from a single normal distribution (it would be nice if everything did!). In such cases, arbitrary distributions can be modeled using a Gaussian Mixture Model. A Gaussian Mixture Model can be created by **superimposing several weighted Gaussians**. If a sufficiently large number of Gaussians are used and their mean and covariances are varied, we can model quite complex distributions. $p(x) = \sum_k w_k N(x | \mu_k, \Sigma_k)$ where $(w_k$) is the weight or the mixing coefficient associated with each Gaussian distribution. The individual Gaussian distributions are called the components of the mixture model. It is possible to create mixture models using other distributions such as a Bernoulli distribution. The mixing coefficients are such that $(0 < w_k < 1$) and $(\sum_k w_k = 1$). This term $(w_k$) can be interpreted as the prior probability of picking the 'k'th term, i.e. $(p(k)$). The term $(p(x)$) can be written using Bayes Theorem as $p(x) = \sum_k p(k) p(x|k)$ and as a result $(p(x|k)$) is equivalent to $(N(x | \mu_k, \Sigma_k)$). The reason for this detour is to introduce the term 'responsibilities' which is $(p(k|x)$). The name arises from the fact that the term $(p(k|x)$) explains the responsibility that the 'k'th component has in explaining the observation x. One way to determine the values of $(\mu, \Sigma, w_k$) is through numerical optimization of the likelihood function since closed form solutions are no longer available. Expectation Maximization algorithms are another way to accomplish this as well.