Title: Extracting and searching news articles in web portal news pages
Authors: Namyun Kim
Addresses: School of Computer Engineering, Hansung University, Seoul, 02876, South Korea
Abstract: Recently, a large amount of news articles is being created online, and news articles are important resources for understanding social phenomena and trends. Accordingly, a web portal service provides a 'portal news page' that classifies news articles published from various news sources into sections and provides each news article with a certain structure. Therefore, by analysing portal news pages, it is possible to automatically extract information about news articles. In this paper, we introduce a prototype that extracts and searches key information of news articles for analysis. Specifically, we describe: 1) a crawler that collects, analyses and parses news articles; 2) an Elasticsearch server that indexes and searches news information; and 3) a front-end application that provides a search user interface. These systems are expected to provide the foundation for news analytics and forecasting services.
Keywords: crawler; search engine; Elasticsearch; news service and analysis.
DOI: 10.1504/IJCVR.2020.107241
International Journal of Computational Vision and Robotics, 2020 Vol.10 No.3, pp.202 - 212
Received: 12 Jan 2019
Accepted: 02 May 2019
Published online: 11 May 2020 *