Title: Data warehouse ETL+Q auto-scale framework
Authors: Pedro Martins; Maryam Abbasi; Pedro Furtado
Addresses: Department of Informatics, Faculty of Sciences and Technology, University of Coimbra, Portugal ' Department of Informatics, Faculty of Sciences and Technology, University of Coimbra, Portugal ' Department of Informatics, Faculty of Sciences and Technology, University of Coimbra, Portugal
Abstract: In this paper, we investigate the problem of providing scalability (out and in) to extraction transformation load (ETL) and querying (Q) (ETL+Q) process of data warehouses. In general, data loading, transformation and integration are heavy tasks that are performed only periodically, instead of row by row. Parallel architectures and mechanisms are able to optimise the ETL process by speeding-up each part of the pipeline process as more performance is needed. We propose parallelisation solutions, called AScale, for each part of the ETL+Q, that is, an approach that enables the automatic scalability and freshness of any data warehouse and ETL+Q process. Our results show that the proposed system algorithms can handle scalablity to provide the desired processing speed.
Keywords: data warehousing; scalability; freshness; processing speed; performance; parallel processing; distributed systems; parallelisation; load balancing; extraction transformation load; ETL; querying; data warehouses.
DOI: 10.1504/IJBISE.2016.081592
International Journal of Business Intelligence and Systems Engineering, 2016 Vol.1 No.1, pp.49 - 76
Received: 26 Jan 2015
Accepted: 04 Sep 2015
Published online: 17 Jan 2017 *