Title: Optimising the calculation of statistical functions
Authors: André Rodrigues; Carla Silva; Paulo Borges; Sérgio Silva; Inês Dutra
Addresses: NLPC Lda., Praça Mouzinho de Albuquerque, 113 – 5º, 4100-359 Porto, Portugal ' NLPC Lda., Praça Mouzinho de Albuquerque, 113 – 5º, 4100-359 Porto, Portugal ' NLPC Lda., Praça Mouzinho de Albuquerque, 113 – 5º, 4100-359 Porto, Portugal ' NLPC Lda., Praça Mouzinho de Albuquerque, 113 – 5º, 4100-359 Porto, Portugal ' Department of Computer Science, CRACS INESC TEC and University of Porto, Rua do Campo Alegre, 1021, 4169-007, Porto, Portugal
Abstract: Statistical data analysis methods are well-known for their difficulty in handling large number of instances or large number of parameters. In this paper, we study popular and well-known statistical functions, generally applied to data analysis, and assess their performance as implemented by SPSS, MATLAB, R and our own software, DataIP. We use medium to large datasets and show that DataIP outperforms SPSS, MATLAB and R by several orders of magnitude. We argue that the design and implementation of these functions need to be rethought to adapt to today's data challenges.
Keywords: statistical data analysis; statistical functions; performance evaluation; SPSS; MATLAB; optimisation.
DOI: 10.1504/IJBDI.2017.083155
International Journal of Big Data Intelligence, 2017 Vol.4 No.2, pp.123 - 138
Received: 22 Mar 2016
Accepted: 12 Sep 2016
Published online: 21 Mar 2017 *