Title: Dynamic video summarisation using stacked encoder-decoder architecture with residual learning network
Authors: M. Dhanushree; R. Priya; P. Aruna; R. Bhavani
Addresses: Department of Computer Science and Engineering, Faculty of Engineering and Technology, Annamalai University, Annamalai Nagar, Tamil Nadu, India ' Department of Computer Science and Engineering, Faculty of Engineering and Technology, Annamalai University, Annamalai Nagar, Tamil Nadu, India ' Department of Computer Science and Engineering, Faculty of Engineering and Technology, Annamalai University, Annamalai Nagar, Tamil Nadu, India ' Department of Computer Science and Engineering, Faculty of Engineering and Technology, Annamalai University, Annamalai Nagar, Tamil Nadu, India
Abstract: In the past decade, video summarisation has emerged as one of the most challenging research fields in video understanding. Video summarisation is abstracting an original video by extracting the most informative parts or key events. In particular, generic video summarisation is challenging as the key events do not contain specific activities. In such circumstances, extensive spatial features are needed to identify video events. Thus, a stacked encoder-decoder architecture with a residual learning network (SERNet) model is proposed for generating dynamic summaries of generic videos. GoogleNet characteristics are extracted for each frame in the proposed model. After the bi-directional gated recurrent unit encodes video features, the gated recurrent unit decodes them. Both the encoder and decoder architectures leverage residual learning to extract hierarchical dense spatial features to increase video summarisation F-scores. SumMe and TVSum are used for experiments. Experimental results demonstrate that the suggested SERNet model has an F-score of 55.6 and 64.23 for SumMe and TVSum. Comparing the proposed SERNet model against state-of-the-art approaches indicates its robustness.
Keywords: video abstraction; dynamic video summarisation; deep learning; residual learning; skip connections; GoogleNet; long-term memory; gated recurrent unit; stacked encoder; key shot selection; kernel temporal segmentation.
DOI: 10.1504/IJIEI.2024.137702
International Journal of Intelligent Engineering Informatics, 2024 Vol.12 No.1, pp.27 - 59
Received: 02 Sep 2023
Accepted: 29 Dec 2023
Published online: 02 Apr 2024 *