I have a history of obsessing over predicting the future. Probably the first major time series project I had was in grad school, with a consulting project forecasting amusement park attendance that ultimately found a LSTM network to be most effective. Again at my first data science job, forecasting demand and sales and prices. Probably what interested me most in these problems was the fact they were more difficult to solve – normal classification and regression problems generally being straight-forward to solve with current data science tools.
What annoys me most about time series forecasting is there no scikit-learn equivalent in Python where you can do all your forecasting tasks in what place. You’ve got to combine a dozen or so different packages, and you need quite a bit of domain-specific knowledge. And even with all that domain-specific knowledge, at the end of the day, like in much of data science, it mostly comes down to an educated guess-and-check process.
Let me rant, briefly, against statisticians. Statisticians rule the time series world. They have invented a bewildering array of statistical tests to assert their dominance in the domain. To be done as statistics would seem to argue is the proper way, you would spend a few days developing a single ARIMA model for a single time series. In reality, science is based on observation. Ultimately, these statistics tests were built and tested on some set of observations about the real world. However, for any particular case, you are far better off testing your methods on the most relevant observations you have – your training data – and using cross validation rather than arbitrary statistical tests.
Statistics, in short is often more about human feelings than science. The feelings of statisticians, the feelings of business people, all who take confidence in following long-standing complicated-sounding rules.
Viewed from a less skeptical lens, you could maybe say I am in the ‘science fiction’ school of forecasting. I love reading science fiction. I also love making fun of the original Star Trek which failed to predict all sorts of technology which would exist mere decades later. My point here being that predicting the future is always going to be far less certain than we like. We can examine what worked well repeatedly as things changed in the past, and hope that things continue to happen more-or-less that way. I accept that some of my work is ultimately fiction, but I ground it as much in science, as much in observation, as I can. Thus, I have a science fiction way of forecasting.
This entire commentary so far is wandering its way to the fact that I have released the early version of AutoTS, an automated time series package. I have tried to build in the different models, different features, different transformations, different error metrics, different cross validation methods, and all the different things that make reading the future just a bit difficult.
It’s amazing! Or will be, in a few years, I hope. As it is, this project has quite a bit of my time put into it over the last few months. It was born from my work at US Venture where I had assembled a crude script for comparing time series methods to each other. When I quit US Venture, I promptly spent the next three weeks or so with a whiteboard, a pile of textbooks, and of course, a computer – all to simply brainstorm what functionality I would want, and roughly how that would work together, without writing a single line of code.
Since those early days, I have spent most of my unemployed time over the holidays, and even, perhaps sacrilegiously, quite a lot of time on my current big New Zealand vacation, working on this project. It works! It has underlying influence from both TPOT and DataRobot, but at the end of the day is a mess largely of my own random tendencies. It has some decidedly questionable aspects, like my point-to-probabilitistic methods – which hopefully true statisticians will never see lest I face a witch hunt. Speaking of witches, I also have some wizard-themed function names and various Monty Python quotes throughout the documentation. Being a product of my mind, it is prone to the bizarre tendencies therein.
If you are wondering who I designed it for (besides, obviously, myself), I really designed it for medium sized businesses forecasting 100’s to 1,000’s of time series at once. My dream is basically to throw together every time series a business needs (of the same frequency) into one giant dataset, run this slow but steady evaluator on it to find the best method(s), then forecast all those series, hopefully benefiting from some shared knowledge among the series, and ultimately, simplicity in infrastructure.
AutoTS features as of version 0.1.0
- Thirteen available models, with thousands of possible hyperparameter configurations
- Finds optimal time series models by genetic programming
- Handles univariate and multivariate/parallel time series
- Point and probabilistic forecasts
- Ability to handle messy data by learning optimal NaN imputation and outlier removal
- Ability to add external known-in-advance regressor
- Allows automatic ensembling of best models
- Multiple cross validation options
- Subsetting and weighting to improve search on many multivariate series
- Option to use one or a combination of SMAPE, RMSE, MAE, and Runtime for model selection
- Ability to upsample data to a custom frequency
- Import and export of templates allowing greater user customization
Now you ask, but why do this? Hasn’t this been done before? By people smarter than you?
The immediate answer is yes, lots of smart (and not so smart) people have made forecasting programs of varying degrees of quality. They abound in private equity companies and are sold targeting large supply chain managers. However, they tend to be old-fashioned (statistics, not data science) and, importantly, not open source. Also, predicting the future is messy in a way that discourages, I presume, many of those of an analytics mindset. In short, there was an opening, and I took it.
Obviously, there is also a value to me. I am, simply put, too young to be taken very seriously in my career. Expert thinkers do not still have acne on their face, like I do, at least not to many people in this world. A successful open source project would hopefully go far to establishing myself.
Finally, if I could significantly improve forecasting in companies around the world, think of the potential environmental impact! If 1% of major distribution companies got a 1% efficiency gain from my work, that would be a noticeable improvement for the world. Open source projects like this have the potential to reach out and significantly improve the world in a way closed source rarely, if ever, can.