Title: Identification of co-occurring insertions in cancer genomes using association analysis
Authors: Michael Steinbach; Sean Landman; Vipin Kumar; Haoyu Yu
Addresses: Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455, USA ' Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455, USA ' Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455 USA ' Minnesota Supercomputer Institute, University of Minnesota, Minneapolis, MN 55455, USA
Abstract: Collections of tumour genomes created by insertional mutagenesis experiments, e.g., the Retroviral Tagged Cancer Gene Database (RTCGD), can be analysed to find connections between mutations of specific genes and cancer. Such connections are found by identifying the locations of insertions or groups of insertions that frequently occur in the collection of tumour genomes. Recent work has employed a kernel density approach to find such commonly occurring insertions or co-occurring pairs of insertions. Unfortunately, this approach is extremely compute intensive for pairs of insertions and even more intractable for triples, etc. We present a technique that can efficiently find commonly co-occurring sets of insertions (or other genomic features) of any length by applying Association Analysis (AA) (frequent pattern mining) techniques from data mining. A comparison to the kernel density approach on RTCGD is provided, as well as results of the association approach on two other tumour data sets.
Keywords: association analysis; frequent pattern mining; kernel density estimation; cancer genomes; mutagenesis experiments; gene mutations; tumours; insertion identification; oncogenes; data mining; bioinformatics; co-occurring insertions.
DOI: 10.1504/IJDMB.2014.062892
International Journal of Data Mining and Bioinformatics, 2014 Vol.10 No.1, pp.65 - 82
Received: 17 Feb 2012
Accepted: 02 Mar 2012
Published online: 21 Oct 2014 *