Publications

PCA-Based Peak Feature Selection for Classification of Spectroscopic Datasets

I. Schmitt1, K. Sowoidnich2, T. Gosswami2, B. Sumpf2, M. Maiwald2, M. Wolff3

Published in:

J. Chemom., vol. 39, no. 11, art. e70074, doi:10.1002/cem.70074 (2025).

Abstract:

Reducing feature dimensionality in spectroscopic data is crucial for efficient analysis and classification. Using all available features for classification typically results in an unacceptably high runtime and poor accuracy. Popular feature extraction methods, such as principal component analysis (PCA), linear discriminant analysis (LDA), and autoencoders, reduce feature dimensionality by extracting latent features that can be challenging to interpret. To enable better human interpretation of the classification model, we avoid extraction methods and instead propose applying feature selection methods. In this work, we develop an innovative PCA-based feature selection method for spectroscopic data, providing an essential subset of the original features. As an important advantage, no prior knowledge about the characteristic signals of the respective target substance is required. In this proof-of-concept study, the proposed method is initially characterized using simulated Raman and infrared absorption datasets. From the top five PCA eigenvectors of spectroscopic data, we identify a set of three top peaks each at specific wavenumbers (features). The compact set of selected features is then used for classification tasks applying a decision tree. Based on two well-defined spectroscopic datasets, our study demonstrates that our new method of PCA-based peak finding outperforms selected other approaches with regard to interpretability and accuracy. For both investigated datasets, accuracies greater than 97% are achieved. Our approach shows large potential for accurate classification combined with interpretability in further scenarios involving spectroscopic datasets.

1 Chair of Database and Information Systems, BTU Cottbus-Senftenberg, Cottbus, Germany
2 Ferdinand-Braun-Institut (FBH), Berlin, Germany
3 Chair of Communications Engineering, BTU Cottbus-Senftenberg, Cottbus, Germany

Keywords:

classification; decision tree; peak finding; principal component analysis; spectroscopic data

© 2025 The Author(s). Journal of Chemometrics published by John Wiley & Sons Ltd
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
Rightslink® by Copyright Clearance Center

Full version in pdf-format.