Time series, while not hard persay, is not as straight-forward at the classification and regression problems that are the bread and butter of data science.
- Dominant models haven’t changed much in decades. This scares us data scientists, we need the latest and greatest.
- No consistent open-source solutions (outside of R’s
forecast
) - Commercial is expensive for questionable improvement – they are mostly R wrapped in custom wrappers.
- Custom solutions are difficult to maintain.
- Accuracy is often low! Often because time series of interest are tied to very complicated external factors, like the global economy.
- Productionizing is difficult, because models need to be frequently retrained, and handle constantly evolving situations
And I am about to give you the means by which to get simple, quick, reliable, and near-optimal (say, 90% of what’s possible) results.
- My time series is flow-y
- ETS with Statsmodels
- My time series is spikey
- Prophet with Regressor and/or Holidays
- I’ve got lots of these time series things to do
- GluonTS models
- Only if required after feedback, develop a custom model.
ETS in Statsmodels
ETS is mathematically very similar to ARIMA, but is more robust and simpler to use. I can understand the math, which tells you how simple it is. It’s also really fast, faster than Prophet. In the M4 competition across a wide variety of data, the benchmark ETS ensemble had a sMAPE of 12.5 versus the very best model at 11.4 sMAPE. Conclusion: ETS is often extremely accurate. Only flaw is that it doesn’t work well with more irregular time series.
Prophet
Prophet is probably the leader in modern data science because it is new (2017) and because it was designed by Facebook for modern use cases. It is quite fast, often quite accurate (although, I find, by itself often a bit worse than ETS). Most importantly, it makes adding in holidays and a regressor really easy. Thus Prophet is a particularly good choice for more irregular time series where holidays are a regressor can explain those irregularities. The easiest to use, bar none.
GluonTS
Neural networks are the future of time series (probably). One neural net can be trained for many related time series. This includes ability to predict new time series better from limited data, good for new product additions. Very fast on many related time series (fastest once the number of related series to predict gets in the hundreds, thousands, hundreds of thousands…). The big caveat here is that it is new and still limited in production usability. That said, being a product of AWS Labs, new still means largely production ready.
When to Go Custom:
- The value of improved accuracy is quite high.
- The time series is intermittent or irregular.
- Additional external variables are available.
- Forecasts are needed for new related time series (ie new product)
How to Go Customer:
- Usually an ensemble
- Involves creative feature engineering
- The hardest and most important part
- https://www.kdnuggets.com/2017/11/automated-feature-engineering-time-series-data.html
- Can use many traditional machine learning models when framed as a regression
- RNNs are popular and, sometimes, superbly accurate
- And much, much more…
I have a sample of the time series methods used here:
https://github.com/winedarksea/TimeSeries/blob/master/TimeSeriesSamples.py