AutoTS development has been on hold for the last three months or so. Partly that is because it finally works more or less as I originally intended. In fact, it works better than I originally intended. Still, the current development feels like a failure as I failed to use it to submit any decent models for the M5 competition. My fancy horizontal ensembles were just too slow. There are perhaps two excuses, the first that I was using a laptop, not a powerful workstation, to generate results, the second that I put off actually running models on their data until too late, spending the time instead developing the more general code of AutoTS.
Throughout the process though, I have had plenty of time to consider what is next for AutoTS, and here is what I am thinking as of now:
Concerns and Future Directions
- resource utilization at scale
- improve horizontal ensembling efficiency in particular
- structure of General Transformer – it just feels awkward
- overfitting on first train segment (train progressive subsets?)
- End Users add their own models easily
- improve starting templates (sorta best to wait until other things ironed out)
- Improve documentation and usability of lower level code
- better metrics, perhaps improve ‘contour’
- better summarization of many time series into a few high-information time series as parallel or regressors
- Ability to automatically add useful global information as regressors or parallel series
- Built in GUI or Command Line tool
I am somewhat held back by a feeling that perhaps there is a major structural change I should make – and that I shouldn’t make smaller changes until that bigger change is decided upon and done. This is perhaps my greatest personal roadblock – my own fear of ‘wasted’ work.
I feel like the single most important improvement is speed. Speed is, theoretically, my lowest concern, beneath ‘ease of use’ and ‘accuracy’. For forecasting a single series, most models are indeed very fast right now. However, I have come to realize that for large and many time series, ‘speed’ is actually the most critical part of ‘ease of use’. Theoretically my package should be easy to setup for multiprocessing, however I am concerned that many of the packages I in turn rely on for modeling – and which have some degree of ‘auto’ multiprocessing built in, will conflict with the higher level multiprocessing.
I would also like to do some work to improve my GitHub setup for AutoTS, hopefully encouraging other contributors. Although on the other hand I’m also not sure if I want other contributors, it’s all mine!
Basic Tenants of AutoTS
- Ease of Use > Accuracy > Speed (with speed more important with ‘fast’ selections)
- The goal is to be able to run a horizontal ensemble prediction on 1,000 series/hour with a ‘fast’ selection, 10,000 series/hour with ‘very fast’.
- Availability of models which share information among series
- All models should be probabilistic (upper/lower forecasts)
- All models should be able to handle multiple parallel time series
- New transformations should be applicable to many datasets and models
- New models need only be sometimes applicable
- Fault tolerance: it is perfectly acceptable for model parameters to fail on some datasets, the higher level API will pass over and use others.
- Missing data tolerance: large chunks of data can be missing and model will still produce reasonable results (although lower quality than if data is available)