Time Series Data - Vivek's Digital Garden

# Time Series Data Ordered sequences of values (usually equally spaced over time) - Univariate vs multi-variate - value of data is more if you combine multi-variate data - What can we do with machine learning on time-series data - forecast future - impute past data - fix gaps in data - Detect anamolies - Cause of the time series - speech recognition - Patterns in Time series data - Trend - long term increase or decrease in value - Seasonality - repeating pattern in data - White noise - Auto-correlation - it correlates with a delayed copy of itself. (eg. decay with random spikes) - time series is said to have "memory" (steps dependent on previous steps) - spikes are called "innovations" i.e it cannot be predicted - eg. ![[Pasted image 20210118143125.png]] - Stationary or Non-stationary Time series - When the previous pattern in trend/seasonality/white noise/auto-correlation changes, it is called non-stationary - The behavior of the time series has changed - Splitting Training Test sets - Train on training period and test on validation period until you are happy. Then train on both train and validation set and test on test set. Then you are happy train again with test - Fixed partitioning - ![[Pasted image 20210118152434.png]] - If time series has seasonality you want to make sure that time series is split in whole number of seasons - Roll forward partitioning - progressively increase training set size. The dev/validation test size is only the next day/chunk of data - ![[Pasted image 20210118152744.png]] - Metrics - MSE Mean Squared error - most common - RMSE Root mean squared error - allows the errors to be same mag as outputs - MAE Mean absolute error - does not penalize large errors - If large errors are dangerous use MSE/RMSE. If general size of error is more important than outliers, MAE is better - `keras.metrics.mean_absolute_error(x_valid, naive_forecast).numpy()` - MAPE (mean absolute percent error) - Naive Forecast: - Current prediction = last period's actuals - This method is used as a baseline to compare against "more sophisticated" predictions - Moving Average - Eliminates the noise but introduces a lag - Sometimes simple approaches work fine (no need for deep learning) - To improve this - remove the trend and seasonality first - Then make a forecast based on moving average - Add back the trend and seasonality - Example 1 - Forecast = moving average of differenced timeseries + series(t-365) - Example 2 (produces the best forecast) - Forecast = training moving average of differenced series + central moving average of past series (t-365) - Trailing vs centered window for moving average - Trailing (t-10 to t-1) - Central (t-5 to t+5) - more accurate but cannot use for present values. But can be used for past - Univariate vs multi-variate modele - one-step vs multi-step - Multi-step Strategies - Direct multistep (one model for each step)- dumb idea - Recursive multi-step (take one step model and use output for next step)- errors can blow up - Direct-recursive hybrid - seprate model for each time step but each model uses predictions made by models in previous steps (sounds complex and dumb) - Multi-output (learns the relation between inputt and output as well as previous output and current output) __holy grail__ - complex,Need lot of data(to avoid overfitting) and is slow to train