Title: A comprehensive analysis about the influence of low-level preprocessing techniques on mass spectrometry data for sample classification
Authors: Hugo López-Fernández; Miguel Reboiro-Jato; Daniel Glez-Peña; Florentino Fernández-Riverola
Addresses: Department Informática, University of Vigo, Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain ' Department Informática, University of Vigo, Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain ' Department Informática, University of Vigo, Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain ' Department Informática, University of Vigo, Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain
Abstract: Matrix-Assisted Laser Desorption Ionisation Time-of-Flight (MALDI-TOF) is one of the high-throughput mass spectrometry technologies able to produce data requiring an extensive preprocessing before subsequent analyses. In this context, several low-level preprocessing techniques have been successfully developed for different tasks, including baseline correction, smoothing, normalisation, peak detection and peak alignment. In this work, we present a systematic comparison of different software packages aiding in the compulsory preprocessing of MALDI-TOF data. In order to guarantee the validity of our study, we test multiple configurations of each preprocessing technique that are subsequently used to train a set of classifiers whose performance (kappa and accuracy) provide us accurate information for the final comparison. Results from experiments show the real impact of preprocessing techniques on classification, evidencing that MassSpecWavelet provides the best performance and Support Vector Machines (SVM) are one of the most accurate classifiers.
Keywords: mass spectrometry data; data preprocessing; low-level preprocessing; sample classification; model comparison; software comparison; bioinformatics; support vector machines; SVM; classification accuracy.
DOI: 10.1504/IJDMB.2014.064897
International Journal of Data Mining and Bioinformatics, 2014 Vol.10 No.4, pp.455 - 473
Received: 02 Nov 2012
Accepted: 25 Mar 2013
Published online: 21 Oct 2014 *