Title: Discovery of deep order-preserving submatrix in DNA microarray data based on sequential pattern mining

Authors: Zhiwen Liu; Yun Xue; Meihang Li; Bo Ma; Meizhen Zhang; Xin Chen; Xiaohui Hu

Addresses: School of Physics & Telecommunication Engineering, South China Normal University, Guangzhou, 510006, China ' Guangdong Provincial Key Laboratory of Quantum Engineering and Quantum Materials, Guangdong Provincial Engineering Technology Research Center for Data Science, School of Physics & Telecommunication Engineering, South China Normal University, Guangzhou, 510006, China ' School of Physics & Telecommunication Engineering, South China Normal University, Guangzhou, 510006, China ' School of Physics & Telecommunication Engineering, South China Normal University, Guangzhou, 510006, China ' School of Physics & Telecommunication Engineering, South China Normal University, Guangzhou, 510006, China ' School of Physics & Telecommunication Engineering, South China Normal University, Guangzhou, 510006, China ' School of Physics & Telecommunication Engineering, South China Normal University, Guangzhou, 510006, China

Abstract: In recent years, order-preserving submatrix (OPSM) model has been widely used in gene expression data analysis. Since it focuses on the changes between the elements rather than the real value, it shows better robustness and statistical significance among results than other models do. Currently, many OPSM algorithms are heuristic. They cannot mine all OPSMs as well as the deep OPSMs which are of biological significance in gene expression data. In this paper, an exact algorithm is proposed to find OPSMs by using frequent sequential pattern mining method. Firstly, we find out all common subsequences (ACS) between any two rows through dynamic programming. Then, we store them into a suffix tree. After that, we can get all OPSMs in this suffix tree, including deep OPSMs. Verified by the real gene data and artificially synthesised data, it is proved that our algorithm is efficient and meaningful.

Keywords: OPSM; frequent sequential pattern; all common subsequences; dynamic programming.

DOI: 10.1504/IJDMB.2017.085280

International Journal of Data Mining and Bioinformatics, 2017 Vol.17 No.3, pp.217 - 237

Received: 30 Dec 2015
Accepted: 29 Mar 2017

Published online: 19 Jul 2017 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article