# Adam Optimization
Adam optimization is a combination of [[RMSProp]] and [[Momentum]]. It is also [[Exponential Weighted Average#Bias correction|bias corrected]].
$V_{d\theta}=\frac{1}{1-\beta^t}(\beta_1 V_{d\theta} + (1-\beta_1)d\theta)$
$S_{d\theta}=\frac{1}{1-\beta^t}(\beta_2 S_{d\theta} + (1-\beta_2)d\theta^2)$
$\theta = \theta - \alpha \frac{V_{d\theta}}{\sqrt{S_{d\theta} + \epsilon}}$
## AMSGrad
The intuition is from the cases where the Adam did not converge and momentum overperformed Adam.
It just adds another term just before the parameter update.
$S_{d\theta}^t = max(S_{d\theta}^{t-1}, S_{d\theta}^t)$
## AdamW
https://openreview.net/pdf?id=ryQu7f-RZ
## Nadam
Instead of the classical momentum it uses a Nesterov momentum. This doesn't seem to exist in either Pytorch or Keras
---
Related: [[Optimizers]]