TPOT

The Tree-Based Pipeline Optimization Tool (TPOT) was one of the very first AutoML methods and open-source software packages developed for the data science community. TPOT was developed in 2015 by Dr. Randal Olson while a postdoctoral student with Dr. Jason H. Moore.

The goal of TPOT is to automate the building of ML pipelines by combining a flexible expression tree representation of pipelines with stochastic search algorithms such as genetic programming. TPOT makes use of the Python-based scikit-learn library as its ML menu.

Several peer-reviewed papers have been published on TPOT. Our first paper in 2016 won a best paper award at the EvoStar computer science conference. Our second paper in 2016 won a best paper award at the GECCO computer science conference. We showed in a 2017 paper presented at the GECCO conference how TPOT could be adapted to the analysis of big data from genetic studies of common human diseases. This paper was nominated for a best paper award. Here is our latest paper on some new operators to facilitate scaling TPOT to big data. Here is a paper on predicting risk of coronary artery disease with TPOT. Here is a paper on predicting heart failure risk with TPOT. Here is a book chapter that reviews TPOT. Here is a recent review on TPOT for genetic and genomic analysis.

The TPOT software is open-source, programmed in Python, and available on GitHub.

TPOT 2 is now available on Github with a new code base and pipelines represented as directed-acyclic graphs.