Title: Extraction and visualisation of citation relationships and its attributes for papers in PDF

Authors: Sergey Parinov

Addresses: Central Economics and Mathematics Institute of RAS, Moscow, Russia; Russian Presidential Academy of National Economy and Public Administration, Moscow, Russia

Abstract: This paper presents a method to process a content of research papers in binary PDF format at a server side that gives research information systems new features of citation content analysis. This method efficiently generates JSON versions of PDF documents that allows an easier recognition of papers' references, in-text citations, citation context, etc. As a result, one can parse an extended set of citation data, including a location of citations in a research paper's structure, frequency of mentioning for the same references, style of reference mentioning and so on. Based on these data we upgrade traditional citation relationships by adding some semantics and other attributes. Formatting these data according W3C Web Annotation Data Model and integrating the data with some annotation tools, we visualise the citation relationships, its semantic attributes, related statistics and some other data as annotations to content of PDF documents available for users of a research information system.

Keywords: research information system; PDF.js; PDF to JSON conversion; citation relationships; semantic attributes; citation content analysis; visualisation.

DOI: 10.1504/IJMSO.2017.093626

International Journal of Metadata, Semantics and Ontologies, 2017 Vol.12 No.4, pp.195 - 203

Received: 28 Jan 2018
Accepted: 28 Feb 2018

Published online: 30 Jul 2018 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article