Title: Automatic identification and classification of Palomar Transient Factory astrophysical objects in GLADE
Authors: Weijie Zhao; Florin Rusu; Kesheng Wu; Peter Nugent
Addresses: University of California Merced, 5200 N Lake Rd., Merced, CA 95343, USA ' University of California Merced, 5200 N Lake Rd., Merced, CA 95343, USA ' Lawrence Berkeley National Laboratory, 1 Cyclotron Rd., Berkeley, CA 94720, USA ' Lawrence Berkeley National Laboratory, 1 Cyclotron Rd., Berkeley, CA 94720, USA
Abstract: Palomar Transient Factory (PTF) is a comprehensive detection system for the identification and classification of transient astrophysical objects. In this paper, we make two significant contributions to the PTF pipeline. First, we present an experimental study that evaluates a novel implementation of the real-time classifier in GLADE - a parallel data processing system that combines the efficiency of a database with the extensibility of map-reduce. We show how each stage in the classifier maps optimally into GLADE tasks by taking advantage of the unique features of the system - range-based data partitioning, columnar storage, multi-query execution, and in-database support for complex aggregate computation. Second, we introduce a novel parallel similarity join algorithm for advanced transient classification. We implement this algorithm in GLADE and execute it on a massive supercomputer with more than 3,000 threads, achieving more than three orders of magnitude improvement over the PostgreSQL solution.
Keywords: parallel databases; multi-query processing; scientific data analysis; similarity join; astronomical surveys; transient identification.
DOI: 10.1504/IJCSE.2018.093775
International Journal of Computational Science and Engineering, 2018 Vol.16 No.4, pp.337 - 349
Received: 13 Apr 2016
Accepted: 30 Jun 2016
Published online: 06 Aug 2018 *