Title: Localisation and classification of surgical instruments in laparoscopy videos using deep learning techniques
Authors: Avanti Bhandarkar; Priyanka Verma
Addresses: Department of Electronics and Telecommunication Engineering, Mukesh Patel School of Technology Management and Engineering, Mumbai, Maharashtra, India ' Department of Electronics and Telecommunication Engineering, Mukesh Patel School of Technology Management and Engineering, Mumbai, Maharashtra, India
Abstract: Surgical trainees often use laparoscopic surgery videos to understand the appropriate use of instruments and visualise the surgical workflow better, but these videos may be difficult to interpret without proper annotations. In recent times, neural networks have emerged as an accurate and effective solution for instrument detection and classification in surgical video frames, which can subsequently be used to automate the annotation process. The proposed implementation uses faster-RCNNs and bidirectional LSTMs with (and without) time-distributed layers and attempts to solve some of the problems commonly faced while developing deep learning models for surgical image and video data: severe class imbalance, inaccuracies during multi-label classification and a lack of spatiotemporal context from adjacent video frames. The bidirectional LSTM with time-distributed layers achieved an average accuracy of 80.20% and an average F1 score of 0.7176 on the M2CAI16 tool dataset, while also achieving 63.49% average accuracy and an average F1 score of 0.522 on unseen data. Jaccard distance and Hamming distance have also been used as object detection-specific metrics; the same model registered the lowest values for both distances, implying accurate localisation and identification of surgical instruments.
Keywords: deep learning; surgical instrument detection; surgical instrument classification; surgical instrument localisation; data augmentation; transfer learning; faster-RCNN; region-based convolutional neural networks; bidirectional LSTMs; long short-term memory networks; Jaccard distance; Hamming distance.
DOI: 10.1504/IJCVR.2025.142918
International Journal of Computational Vision and Robotics, 2025 Vol.15 No.1, pp.75 - 103
Received: 15 Sep 2022
Accepted: 25 Apr 2023
Published online: 02 Dec 2024 *