A distributed architecture for large scale news and social media processing Online publication date: Tue, 06-Apr-2021
by Iraklis Varlamis; Dimitrios Michail; Pavlos Polydoras; Panagiotis Tsantilas
International Journal of Web Engineering and Technology (IJWET), Vol. 15, No. 4, 2020
Abstract: When designing a data processing and analytics pipeline for data streams, it is important to provide the data load and be able to successfully balance it over the available resources. This can be achieved more easily if small processing modules, which require limited resources, replace large monolithic processing software. In this work, we present the case of a social media and news analytics platform, called PaloAnalytics, which performs a series of content aggregation, information extraction (e.g., NER, sentiment tagging, etc.) and visualisation tasks in a large amount of data, on a daily basis. We demonstrate the architecture of the platform that relies on micro-modules and message-oriented middleware for delivering distributed content processing. Early results show that the proposed architecture can easily stand the increased content load that occasionally occurs in social media (e.g., when a major event takes place) and quickly release unused resources when the content load reaches its normal flow.
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Web Engineering and Technology (IJWET):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email subs@inderscience.com