Title: A Wikipedia-based approach to conceptual indexing and retrieval of documents
Authors: Carlo Abi Chahine; Nathalie Chaignaud; Jean-Philippe Kotowicz; Jean-Pierre Pecuchet
Addresses: INSA Rouen LITIS – EA 4108, BP08, 76801 Saint-Etienne du Rouvray, France ' INSA Rouen LITIS – EA 4108, BP08, 76801 Saint-Etienne du Rouvray, France ' INSA Rouen LITIS – EA 4108, BP08, 76801 Saint-Etienne du Rouvray, France ' INSA Rouen LITIS – EA 4108, BP08, 76801 Saint-Etienne du Rouvray, France
Abstract: This paper describes a support system helping archivists in indexing and retrieving documents. Our method is based on the Wikipedia category network as a conceptual taxonomy. A directed acyclic graph (DAG) is built for each document by mapping terms (one or more words) to a concept in the Wikipedia category network. Properties of the graph are used to weight these concepts. According to the so-called important concepts, topics and keywords are proposed. Conceptual indexing consists in finding the relevant Wikipedia papers and categories, which can be used to describe the text. Conceptual retrieval consists in using these papers and categories to return the relevant documents for a user query. Finally, a proof-of-concept prototype is presented.
Keywords: document indexing; document retrieval; similarity measures; knowledge representation; Wikipedia; archivists; category network; conceptual taxonomy; directed acyclic graphs; DAG; conceptual indexing; information retrieval.
International Journal of Knowledge and Learning, 2014 Vol.9 No.1/2, pp.87 - 103
Received: 12 Jun 2013
Accepted: 16 Jun 2014
Published online: 31 Jan 2015 *