In the data and analytics world, trends come and go. A few years ago, the term “big data” was all the rave. Now, it is simply an implicit capability that many organizations possess. More recently, however, terms like artificial intelligence (AI) and Auto ML (automated machine learning) have been making their way into the buzzword bingo cards.
Put AI aside for the moment and think about AutoML. Conceptually, Auto ML is an umbrella term used to describe a set of processes that are well, automatic, and require little to no code. Essentially, these processes may involve automatic:
- Data prep and cleansing routines
- Creation of new features to be used in machine learning models
- Selection of which parameters are to be used in the model
- Model identification and selection
- Hyperparameter tuning
This automation is packed with promises. Some vendors claim that the intention of this technology is for it to be used by citizen data scientists (or non-experts) to address the skills shortage in the industry and that in some cases data scientists will no longer be needed. Quite simply, for a majority of use cases, it would be extremely risky to deploy models without vetting and validation so we would not recommend this approach.
For organizations new to Auto ML, the thought of deploying algorithms into mission-critical business systems without proper vetting and testing should throw up a red flag. In order for ML to work, data scientists are a crucial factor. There is still a use for AutoML in many organizations, but it will not displace or replace our coveted data science unicorns.
AutoML will augment the work of data scientists, not replace it
AutoML is most useful to augment the work of data scientists and citizen data scientists. It will be most effective when organizations use it to quickly identify which areas or projects might be most valuable for further exploration by a data scientist. Auto ML can also be used as an aid to data scientists to increase their productivity, and may also help to increase the accuracy of their final solutions, by helping the data scientist quickly consider a wider range of analytic approaches.
But buyer beware, when looking at Auto ML, consider these questions:
- Is the AutoML transparent? That is, can you explain why the model makes a particular recommendation? This is critical for many applications and is imperative for many regulations.
- Is the Auto ML extensible and flexible? Can you customize and extend the pipeline generated to suit your specific needs?
- What is the workflow and process to vet and deploy the models generated?
- After you deploy the machine learning pipelines, how will you monitor, manage, refresh, and govern the deployment?