TPOT

The Tree-Based Pipeline Optimization Tool (TPOT) was one of the very first AutoML methods and open-source software packages developed for the data science community. TPOT was developed by Dr. Randal Olson while a postdoctoral student with Dr. Jason H. Moore at the Computational Genetics Laboratory of the University of Pennsylvania and is still being extended and supported by this team.

The goal of TPOT is to automate the building of ML pipelines by combining a flexible expression tree representation of pipelines with stochastic search algorithms such as genetic programming. TPOT makes use of the Python-based scikit-learn library as its ML menu.

Several peer-reviewed papers have been published on TPOT. Our first paper in 2016 won a best paper award at the EvoStar computer science conference. Our second paper in 2016 won a best paper award at the GECCO computer science conference. We showed in a 2017 paper presented at the GECCO conference how TPOT could be adapted to the analysis of big data from genetic studies of common human diseases. This paper was nominated for a best paper award. Here is our latest paper on some new operators to facilitate scaling TPOT to big data. Here is a paper on predicting risk of coronary artery disease with TPOT. Here is a paper on predicting heart failure risk with TPOT. Here is a book chapter that reviews TPOT. Please contact us for reprints of these papers and others. Papers about TPOT can also be found on arXiv.

The TPOT software is open-source, programmed in Python, and available on GitHub.