Title: Fast algorithm for assessing semantic similarity of texts
Authors: Andrzej Siemiński
Addresses: Institute for Informatics, Technical University of Wrocław, Wybrzeże Wyspiańskiego 27, 53-370 Wrocław, Poland
Abstract: The paper presents and evaluates an efficient algorithm for measuring semantic similarity of texts. Calculating the level of semantic similarity of texts is a very difficult task and the proposed up to now methods suffer from computational complexity. This substantially limits their application area. The proposed algorithm tries to reduce the problem by merging a computationally efficient statistical approach to text analysis with a semantic component. The semantic properties of text words are extracted from the WordNet lexical database. The approach was tested using WordNets for two languages: English and Polish. The basic properties of this approach are also studied. The paper concludes with an analysis of the performance of the proposed method on a sample database and suggests some possible application areas.
Keywords: text similarity measures; synsets; NLP; WordNet; two-layer retrieval; user dividend; semantic similarity; text analysis; semantics; English; Polish; natural language processing.
DOI: 10.1504/IJIIDS.2012.049311
International Journal of Intelligent Information and Database Systems, 2012 Vol.6 No.5, pp.495 - 512
Received: 19 Mar 2011
Accepted: 25 Nov 2011
Published online: 16 Aug 2014 *