Years ago, I worked on an application that helped companies answer large volumes of emails. The application gave support representatives access to an array of tools to deliver the best response with minimal effort.
Originally, the intention was to automate the entire process, but the models rarely predicted with 100 percent confidence or even 90 percent confidence. So, reps still needed to review the responses in almost every case, providing for nuance, human touch, humor, and making sure to check previous responses.
Very quickly “auto responses” became “auto suggestions.” The engine would select the appropriate template for the topic predicted by the model and, instead of hunting for the template, reps could instead focus on verifying, refining, and personalizing.
A similar story: AutoML
Something similar is true for automated machine learning (AutoML), the technology for automating feature generation and model development. The real power of AutoML is not in replacing data scientists, but in removing the drudgery from their work—to provide them not with final models, but with baseline models that they can improve with the human touch.
Spotfire released a new AutoML extension for Data Science, and it’s been designed with exactly this goal in mind—automation for both features and models, delivered to the data science team as a set of building blocks that they can use as is, tweak and optimize, or break apart completely into reusable modules. This component is an excellent example of how extensible Spotfire® Data Science is, and is built using the standard APIs and extension points that the platform provides.
Addressing the skeptics
Some prominent data scientists have expressed skepticism about the value of AutoML. This skepticism has multiple sources—can AutoML really be a substitute for the artisan data scientist? will AutoML be used naively without appropriate regard for the suitability of the statistical methods used? does AutoML turn modeling into a ‘black box’, lacking the transparency that’s needed to earn the trust of customers?
Those all seem like valid concerns to me. There’s a big difference between the sort of large-scale AI (image recognition, for example) that responds well to brute-force automation, and the everyday analytics (looking into causes of hospital readmissions, for example) that require a whole lot of human ‘stuff’, like business understanding, hunting for data, careful construction of features that express real-world phenomena, evaluation of illegal biases, and so on.
But AutoML clearly has an important role to play in the development of predictive models. Why should I guess at the best number of trees? What power transforms give me the best features in a linear model? Is there a way to convert these categorical values into continuous values or should I use one-hot encoding? Why should I ever have to concern myself with tidying up address fields? Just as we use machine learning to replace rote tasks like deleting spam, so we can use AutoML to reduce the drudgery of data science.
For this to work, the features and models produced by AutoML have to be completely transparent to the data scientist and the engineer. The analytics artifacts that AutoML creates cannot be opaque blocks of code or proprietary transformations. They need to be open to review and modification by the team. And indeed, if this is true, AutoML can become a tool for collaboration—for non-experts to test out hypotheses and suggest basic models that the data scientists can use to initiate a project.
We are as far from using robots to replace data scientists as we are from using robots to answer our email. As AutoML technologies proliferate and mature, my best guess is that they will be used increasingly as a starting point for the work of the data scientist, and not so much as an alternative. They will become suggested models, not automated models.
Watch this video and read this recent blog post to learn more about how AutoML will help data scientists work smarter, faster.