Title: Automatic text summarisation system for scientific papers on the basis of T5 model, on-the-fly constructed corpus and citations

Authors: Mawloud Mosbah

Addresses: Informatics Department, Faculty of Sciences, University 20 Août 1955 of Skikda, B.P. 26 route d'El-Hadaiek, Skikda 21000, Algeria; LRES Laboratory, University 20 Août 1955 of Skikda, B.P. 26 route d'El-Hadaiek, Skikda 21000, Algeria

Abstract: Automatic text summarisation is considered as an important application of automatic natural language processing especially with the unstoppable increasing of information around us. In this paper, we introduce a summarisation prototype based on T5 model trained on limited online constructed corpus with incorporation of citation as external semantic information. The efficacy of T5 model on limited resources, the effectiveness of scientific databases engines as effective information retrieval for generating a similar on-the-fly dataset as well as the value of citation-based summarisation extracted from Google Scholar citation index are exploited here. Our prototype utilises then what exists on the web either resources (similar documents and citations) or systems (information retrieval engines and citation services) and its performance, tied to these online resources and systems, may be increased over time. Experiments conducted give promising results that lead us to draw valuable future perspectives through focusing differently on citation and enlarging the evaluation process.

Keywords: natural language processing; NLP; automatic text summarisation; neural networks; T5 language model; transfer learning; citation-based summarisation.

DOI: 10.1504/IJWET.2024.139859

International Journal of Web Engineering and Technology, 2024 Vol.19 No.2, pp.170 - 193

Received: 09 Oct 2023
Accepted: 16 Mar 2024

Published online: 08 Jul 2024 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article