Title: Mining literatures to discover novel multiple biological associations in a disease context
Authors: Alberto Faro; Daniela Giordano; Francesco Maiorana
Addresses: Department of Electrical, Electronics and Computer Engineering, University of Catania, Viale A. Doria 6, 95125, Catania, Italy ' Department of Electrical, Electronics and Computer Engineering, University of Catania, Viale A. Doria 6, 95125, Catania, Italy ' Department of Electrical, Electronics and Computer Engineering, University of Catania, Viale A. Doria 6, 95125, Catania, Italy
Abstract: The text mining methods proposed to discover associations between pairs of biological entities by mining a scientific literature often extract associations already existing in the literature, whereas their extensions supervise too much the discovery process with heuristics and ontologies that limit the research space. On the other hand, the methods that search novel associations applying the text mining methods to two literatures do not avoid the risk of discovering syllogisms based on faulty premises. For this reason, the paper proposes a method that helps the users to discover associations among biological entities by mining the literature using an unsupervised clustering approach. The discovered multiple associations are derived from binary associations to limit the computational load without compromising the methodology accuracy. A case study demonstrates how the tool derived from the methodology works in practice. A comparison between this tool and other tools available in the literature points out the methodology effectiveness.
Keywords: knowledge discovery; multiple biological associations; text mining; data clustering; Bayesian logic; diseases; bioinformatics; unsupervised clustering; biological literature.
DOI: 10.1504/IJDMB.2015.069419
International Journal of Data Mining and Bioinformatics, 2015 Vol.12 No.2, pp.224 - 256
Received: 21 Feb 2013
Accepted: 19 Oct 2013
Published online: 15 May 2015 *