Title: Review of spectral clustering algorithms used in proteomics
Authors: Shraddha Kumar; Anuradha Purohit; Sunita Varma
Addresses: Department of Computer Engineering, Shri G.S. Institute of Technology and Science, Indore, 452003, MP, India ' Department of Computer Engineering, Shri G.S. Institute of Technology and Science, Indore, 452003, MP, India ' Department of Computer Engineering, Shri G.S. Institute of Technology and Science, Indore, 452003, MP, India
Abstract: Tandem mass spectrometry (MS/MS) generates a large number of spectra showing the signal intensity of detected ions as a function of mass-to-charge ratio. Spectral clustering in proteomics is a powerful but under-utilised technique. Based on the similarity of spectra, the spectral clustering algorithms systematically and unerringly classify large numbers of spectra, such that all spectra in a given cluster belong to the same peptide. The data points in the spectral clustering approach are connected and do not require having convex boundaries. Spectral clustering therefore reduces the running time and computation requirements of spectral library and database searches. It enhances peptide identification process and has fuelled the development of many new proteomics algorithms recently. The goal of this review is to provide a clear overview of the most popular spectral clustering algorithms used in proteomics. It describes a systematic analysis of these spectral clustering algorithms, evaluating the benefits and limitations of each approach.
Keywords: proteomics; tandem mass spectrometry; spectral clustering; consensus spectrum; scoring function; mass spectra; data points; spectral similarity; cluster purity; spectral library; normalised dot product.
International Journal of Data Science, 2023 Vol.8 No.1, pp.16 - 38
Received: 09 Jan 2022
Received in revised form: 10 Mar 2022
Accepted: 28 Mar 2022
Published online: 09 Mar 2023 *