Title: Metalearning using structure-rich pipeline representations for improved AutoML
Authors: Brandon Schoenfeld; Kevin Seppi; Christophe Giraud-Carrier
Addresses: PassiveLogic, Inc., 6405 S 3000 E Ste 300, Salt Lake City, UT, USA ' Department of Computer Science, Brigham Young University, Provo, UT, USA ' Department of Computer Science, Brigham Young University, Provo, UT, USA
Abstract: Automatic machine learning (AutoML) systems have been shown to perform better when they learn from past experience. Examples include Auto-sklearn, which warm-starts the ML pipeline search using existing programs known to perform well on 'similar' tasks, and AlphaD3M, which uses online reinforcement learning to search the ML pipeline space. These metalearning approaches, as well as many others, depend on simplifying assumptions about the pipeline search space and/or the pipeline representation. Here, we attempt to extend the applicability of AutoML by relaxing such simplifications. Using a sizable metadataset of 194 classification tasks and 4,592 pipelines, we show that using pipeline metadata, including the underlying DAG structure, leads to better estimates of pipeline performance and to more robust rankings of pipelines.
Keywords: automatic machine learning; AutoML; metalearning; democratisation of data analysis.
DOI: 10.1504/IJDATS.2022.129174
International Journal of Data Analysis Techniques and Strategies, 2022 Vol.14 No.4, pp.267 - 282
Received: 10 Dec 2021
Received in revised form: 05 Oct 2022
Accepted: 16 Oct 2022
Published online: 27 Feb 2023 *