Fig 3

Photonic Data Science

Research group Prof. Dr. Thomas Bocklitz
Fig 3
Graphic: IPHT

Head of the group

Thomas Bocklitz, University Professor Dr

Head
Professorship of Photonic Data Science
Prof. Dr. Thomas Bocklitz
Image: Prof. Dr. Thomas Bocklitz
JenTower
Leutragraben 1
07743 Jena Google Maps site planExternal link

Scientific Profile

We explore the entire data life cycle of photonic data from generation to the data analysis and to data archiving. Following a holistic approach, we investigate procedures for experiment and sample size planning as well as data pretreatment and combine these procedures with chemometric procedures, model transfer methods and artificial intelligence methods in a data pipeline. In this way, data from various photonic processes can be used for analysis, diagnostics and therapy in medicine, life science, environmental sciences and pharmacy. The data pipeline are implemented in software components and are tested directly in the applicative environment, e.g. in clinical studies. Further focal points are data fusion of different heterogeneous data sources, the simulation of different measurement procedures in order to optimize correction procedures, methods for the interpretation of analysis models and the construction of data infrastructures for different photonic measurement data, which ensure the FAIR principles.

Research Topics

  • Machine learning for photonic image data

    Fig 1

    Graphic: IPHT
  • Chemometrics / machine learning for spectral data
  • Correlation of different measurement methods and data fusion

Areas of application

  • Bio-medical diagnostics using spectral measurement methods and imaging techniques
  • Extraction of higher information from photonic measurement data
  • Simulation- und data-driven correction of photonic data
  • Guarantee of FAIR principles for photonic data

Staff

  1. Vulchi, Ravi Teja

    PhD Student Professorship of Photonic Data Science
    Link to download vCard
    vCard

    JenTower
    Leutragraben 1
    07743 Jena

  2. Yogita, Yogita

    PhD student Professorship of Photonic Data Science
    Link to download vCard
    vCard

    JenTower, Room 15S01
    Leutragraben 1
    07743 Jena

Filter 103 publications

Filter publications

Highlighted authors are members of the research group.

  1. Harnessing Machine Learning and Deep Learning Approaches for Laser-Induced Breakdown Spectroscopy Data Analysis: A Comprehensive Review

    Authors
    P. Dehbozorgi, L. Duponchel, V. Motto-Ros, T. Bocklitz
    Year of publication
    Published in:
    Analysis & Sensing
    Laser-induced breakdown spectroscopy (LIBS) is a rapid, accurate technique for material analysis, offering real-time, minimally destructive, and in situ detection capabilities with broad application potential. LIBS extends its applications across various fields, from geology to biomedicine. However, barriers like matrix effects, reproducibility, self-absorption, and spectral noise often restrict the proper interpretation of the spectra. This review paper examines literature from 2015 to 2025, focusing on the evolution of machine learning (ML) and deep learning (DL) techniques, in LIBS analysis. It evaluates the advancement of these techniques, assessing both the qualitative and quantitative performance of LIBS analysis. These observations support the complementary roles of ML and DL methodologies. ML captures general patterns, while DL, through convolutional neural networks (CNNs), excels at identifying high-level features. This literature review reveals that no single ML or DL tool consistently provides optimal solutions for LIBS applications. The analysis pipeline needs to be tailored based on the LIBS data and the goal of the study. Designing such a framework requires the incorporation of preprocessing techniques to enhance the quality of raw signals. This step should then be followed by integrating the data into predictive models, whether ML or DL, to accomplish tasks like classification or concentration prediction.
    University Bibliography Jena:
    fsu_mods_00027407External link
  2. Siamese networks in Raman spectroscopy: Towards a better performance against replicate variability

    Authors
    S. Guo, T. Bocklitz
    Year of publication
    Published in:
    Talanta: the international journal of pure and applied analytical chemistry
    The power of Raman spectroscopy is largely enhanced by machine learning and chemometrics, which extract and translate the spectral features into high-level biological or clinical knowledge by constructing classical or deep learning models. The generalizability of such models, however, is often degraded due to the large variations between the training data and the data to be predicted. Model transfer showed great potential in this regard, which improved the prediction on the test data without re-building a new model from scratch. We developed a method based on Siamese neural network (SNet) and compared it with two basis models as well as two model transfer methods score movement (MS) and extensive multiplicative scattering correction (EMSC). The performance was systematically verified with a Raman spectral dataset measured from four bacterial species, each consisting of nine biological replicates. Its generalizability was further tested on a second Raman dataset from mice tissue samples. Siamese network was demonstrated to outperform the MS and EMSC, especially given large training datasets. The load on training data, however, is substantially lower than conventional networks and can be slightly reduced when variability between training and test data is properly incorporated into the loss function. Unlike MS and EMSC, more importantly, Siamese network does not require information of test data for model adjustment or data space adaptation, which makes it more advantageous in practice.
    University Bibliography Jena:
    fsu_mods_00030099External link
  3. Blood cancer differentiation based on IR spectroscopy and chemometrics

    Authors
    L. Xie, S. Guo, T. Liu, X. Tang, R. Ji, X. Shen, Y. Xu, L. Chen, S. Wang, T. Bocklitz
    Year of publication
    Published in:
    Computer methods and programs in biomedicine
    Background and Objective White blood cells (WBCs) and their subpopulations play critical roles in detecting blood cancers due to their distinct biological and biochemical characteristics. Infrared (IR) spectroscopy offers a rapid, label-free, and non-destructive approach to probe molecular composition, making it a promising tool for biomedical diagnostics. The objective of this proof-of-principle study is to investigate the possibility of IR spectroscopy combined with chemometrics to differentiate leukemia from lymphoma, and to assess the capability of whole WBCs and their subpopulations in distinguishing the two diseases. Methods We based our study on 21 pediatric patients including 11 leukemia and 10 lymphoma cases, with in total 86,016 IR spectra measured from whole WBCs and the subpopulations. Data pipeline was established, including steps of spectral preprocessing, classification, and data fusion. Particularly, data fusion was implemented via low-, middle-, and high-level strategies, with the aim of combining spectra from different cell types and investigating their capability of differentiating the two blood cancers. Results The classification, both with and without data fusion, was benchmarked via the patient-wise cross-validation. A balanced accuracy of 80.0% was achieved based on IR spectra of whole WBCs. Further improvement was observed when combining whole WBCs and its subpopulations, with the best performance of 90.0% from combining whole WBCs and granulocytes with high-level data fusion strategy. The performance was observed consistent for both linear and nonlinear classifications based on linear discriminant analysis (LDA) and support vector machine (SVM), respectively. Conclusions The results indicate the promising potential of IR spectroscopy of blood samples to distinguish leukemia and lymphoma with the help of chemometric approaches. Further, WBC subpopulations, particularly granulocytes, were proven to contain complementary information to whole WBCs for differentiating leukemia from lymphoma. This provides critical insights for biomedical practice in blood cancer diagnostics.
    University Bibliography Jena:
    fsu_mods_00036197External link
  4. Denoising and Baseline Correction of Low-Scan FTIR Spectra: a Benchmark of Deep Learning Models Against Traditional Signal Processing

    Authors
    A. Mokari, S. Raghunathan, A. Shydliukh, O. Ryabchykov, C. Krafft, T. Bocklitz
    Year of publication
    Published in:
    Bioengineering
    High-quality Fourier Transform Infrared (FTIR) imaging usually needs extensive signal averaging to reduce noise and drift, which severely limits clinical speed. Deep learning can accelerate imaging by reconstructing spectra from rapid, single-scan inputs. However, separating noise and baseline drift simultaneously without ground truth is an ill-posed inverse problem. Standard black-box architectures often rely on statistical approximations that introduce spectral hallucinations or fail to generalize to unstable atmospheric conditions. To solve these issues, we propose a physics-informed cascade Unet that separates denoising and baseline correction tasks using a new, deterministic Physics Bridge. This architecture forces the network to separate random noise from chemical signals using an embedded SNIP layer to enforce spectroscopic constraints instead of learning statistical approximations. We benchmarked this approach against a standard single Unet and a traditional Savitzky–Golay smoothing followed by SNIP baseline correction workflow. We used a dataset of human hypopharyngeal carcinoma cells (FaDu). The cascade model outperformed all other methods, achieving a 51.3% reduction in RMSE compared to raw single-scan inputs, surpassing both the single Unet (40.2%) and the traditional workflow (33.7%). Peak-aware metrics show that the cascade architecture eliminates spectral hallucinations found in standard deep learning. It also preserves peak intensity with much higher fidelity than traditional smoothing. These results show that the cascade Unet is a robust solution for diagnostic-grade FTIR imaging. It enables imaging speeds 32 times faster than current methods.
    University Bibliography Jena:
    fsu_mods_00035379External link
  5. Systematic investigation of preprocessing pipeline for MALDI data

    Authors
    M. Adhikari, O. Ryabchykov, S. Guo, T. Bocklitz
    Year of publication
    Published in:
    Results in Chemistry
    Background Matrix-assisted laser desorption/ ionization Mass Spectrometry (MALDI-MS) is a powerful tool to detect and characterize biomolecules, making it particularly useful in different fields and applications such as proteomics, clinical diagnostics, and biomarker discovery. MALDI data is commonly contaminated by the artefacts originated from both chemical and electrical noise. Data preprocessing is hence important to remove these artefacts and improve the accuracy and reliability of the subsequent (quantitative and qualitative) analysis. A systematic investigation of different preprocessing steps is necessary to establish an effective preprocessing pipeline. Results In this study, we systematically investigated the different steps including interpolation, smoothing, baseline correction, peak alignment, and peak binning, along with normalization to establish a preprocessing pipeline of MALDI spectral data. The performance of the preprocessing steps and pipeline was benchmarked by the balanced accuracy of differentiating hepatocellular carcinoma (HCC) and healthy (normal) based on MALDI spectral data of liver tissue samples. The established preprocessing pipeline improved the balanced accuracy from 61.3% to 77.6% under the patient-level cross-validation, and from 92.9% to 94.7% under spectral-level cross-validation. Significance Our findings demonstrated that the classification performance can be greatly affected by the quality of MALDI data, which can be improved by preprocessing steps. The large improvement from the patient-level validations after preprocessing demonstrated well a satisfying performance of the classification against patient-to-patient variability with the help of our preprocessing pipeline. This study will potentially benefit the MS community.
    University Bibliography Jena:
    fsu_mods_00034772External link
  6. A comparative study of robustness to noise and interpretability in U-Net-based denoising of Raman spectra

    Authors
    A. Mokari, S. Eiserloh, O. Ryabchykov, U. Neugebauer, T. Bocklitz
    Year of publication
    Published in:
    Spectrochimica Acta - Part A: Molecular and Biomolecular Spectroscopy
  7. Lightweight CycleGAN models for cross-modality image transformation and experimental quality assessment in fluorescence microscopy

    Authors
    M. Soltaninezhad, Y. Rouzbahani, J. Contreras, F. Larios, P. Jordan, O. Werz, R. Chippalkatti, D. Abankwa, C. Eggeling, T. Bocklitz
    Year of publication
    Published in:
    Biomedical Optics Express
    With the growing integration of artificial intelligence in scientific and medical applications, lightweight deep learning models have become increasingly important. These models offer substantial reductions in memory usage and computational time. Given that GPU-based model training and inference contribute significantly to carbon emissions, lightweight architectures with comparable performance to parameter-rich models present a more environmentally friendly alternative. Specifically, we build upon CycleGAN with a fixed-channel lightweight U-Net generator for modality transfer from standard confocal to super-resolution STED and deconvolved STED images, and systematically compare it against Pix2Pix and standard CycleGAN baselines. Obtaining paired datasets in medical imaging and super-resolution microscopy is often infeasible due to the need for additional experiments and the intrinsic complexity of biological sample preparation. To address this, we investigate the performance of lightweight CycleGAN models, demonstrating their ability to achieve high-fidelity modality transfer despite reduced model complexity. We introduce a fixed channel strategy within the U-Net-based generator, in contrast to the traditional channel-doubling approach. This modification significantly reduces the number of trainable parameters from 41.8 million to approximately 9 thousand, while achieving comparable or slightly improved performance. We explore the utility of GAN models as a qualitative marker for assessing experimental and labeling quality. When trained on high-quality microscopy images, the GAN implicitly learns the characteristics of optimal imaging. Deviations between GAN-generated outputs trained on high-quality data and low-quality experimental images can highlight potential issues such as photobleaching, experimental artifacts, or inaccurate labeling. In this way, the model can support qualitative assessment of experimental consistency and image fidelity in fluorescence microscopy workflows.
    University Bibliography Jena:
    fsu_mods_00034452External link
  8. Label-Free Differentiation of Antimicrobial Resistance Groups Using Raman Spectroscopy

    Authors
    A. Pistiki, O. Ryabchykov, A. Wagenhaus, T. Bocklitz, S. Deinhardt-Emmer, B. Löffler, P. Rösch, J. Popp
    Year of publication
    Published in:
    Analytical chemistry
    Increasing antimicrobial resistance (AMR) has developed into an enormous health burden. Here, a systematic investigation was conducted to evaluate the discriminative performance of Raman spectroscopy between different resistance classes (Susceptible, ESBL, CRE, VRE, VSE) in common clinical isolates (Escherichia coli, Klebsiella pneumoniae, Klebsiella oxytoca, Citrobacter freundii, Acinetobacter baumanii, Enterococcus faecium). Two different Raman spectroscopic methods (UVRR in bulk and 785 nm excitation directly on the Petri dish) and four different machine learning algorithms (PCA-LDA, PLS-DA, PCA-SVM, PCA-RF) were tested aiming the application of a decision-tree using a 3-step approach composing of species classification, differentiation of susceptible from resistant strains within the species and differentiation of ESBL and CRE as AMR subclasses within the class of antibiotic-resistant strains. In species classification, the two Raman methods yield similar results in all applied models. When attempting the differentiation of susceptible vs resistant strains in the intraspecies level, 785 nm overall outperformed UVRR and PCA-SVM and PLS-DA provided higher discriminative power compared to PCA-LDA and PCA-RF. For the discrimination of ESBL vs CRE isolates UVRR was not suitable as a method and 785 nm excitation provided correct identification of all 9 strains when using PCA-SVM and PLS-DA, confirming stability over replicate-to-replicate variations. Raman spectra from 785 nm excitation directly on the Petri dish combined with PCA-SVM and PLS-DA are suitable for diagnostic application of Raman spectroscopy in hospital settings. These results are the first step of a long journey in the development of Raman spectroscopy for microbiological documentation and extraction of AMR-related information in infectious diseases.
    University Bibliography Jena:
    fsu_mods_00034840External link
  9. Automatic optimization of flat-field corrections by evaluation and enhancement (EVEN) in multimodal optical microscopy

    Authors
    E. Corbetta, M. Calvarese, P. Then, H. Bae, T. Meyer-Zedler, B. Messerschmidt, O. Guntinas-Lichius, M. Schmitt, C. Eggeling, J. Popp, T. Bocklitz
    Year of publication
    Published in:
    Nature Communications
    Uneven illumination affects all images acquired by optical microscopes, especially large, multicolour and nonlinear measurements. Although removal is possible with various algorithms, evaluating raw and processed images is challenging due to the lack of established workflows for image quality assessment. This manuscript describes a machine learning-based method, EVEN (Evaluation and Enhancement), to assess and optimise corrections in optical microscopy. EVEN integrates quantitative image metrics into a Linear Discriminant Analysis model to detect and predict image quality, automatically optimising corrections. The method can be integrated into the optical microscopy pipeline to simplify further processing and analysis. Here, we show the implementation and application of EVEN in different processing scenarios, including multimodal nonlinear imaging of human and neck tissue slices and multichannel fluorescence measurements of stained cells, demonstrating its capability to automatically optimise image quality by assessing single-channel corrections.
    University Bibliography Jena:
    fsu_mods_00029890External link
  10. Complex-Valued Chemometrics for Analyzing Absorbance or Raman Spectra

    Authors
    T. Mayerhöfer, O. Ilchenko, A. Kutsyk, S. Piehler, A. Silge, A. Ramoji, A. Winterfeld, O. Ryabchykov, M. Kiehntopf, T. Bocklitz, J. Popp
    Year of publication
    Published in:
    Analytical chemistry
    Complex-valued chemometrics offers a promising extension of classical regression methods by exploiting both real and imaginary spectral components. Here, we show that conventional absorbance (χ(1)) and Raman (χ(3)) spectra can be transformed into complex-valued forms by combining the measured intensities as imaginary parts with their Kramers–Kronig-derived real parts. We benchmark four regression methods─classical least squares (CLS), inverse least squares (ILS), principal component regression (PCR), and partial least-squares regression (PLSR)─across four representative systems: the quasi-ideal benzene–toluene and benzene–cyclohexane mixtures, the nonideal acetone–chloroform mixture, and blood plasma spiked with glucose and urea. Compared to conventional chemometrics, complex-valued approaches consistently reduce prediction errors (MAE, RMSE, and R2). Implementation is computationally inexpensive, since the Kramers–Kronig transform of absorbance or Raman spectra can be obtained within seconds using FFT-based routines, even for large data sets. Software implementation is straightforward, and programs can be adapted within minutes using standard environments such as Mathematica. Surprisingly, complex-valued ILS matches or surpasses complex-valued PLSR, echoing earlier results in infrared spectroscopy, using the complex refractive index function, and suggesting a re-evaluation of regression hierarchies when complex spectra are available. These findings demonstrate that complex-valued chemometrics is broadly applicable, physically grounded, and capable of enhancing both classical and modern regression strategies in analytical spectroscopy.
    University Bibliography Jena:
    fsu_mods_00035939External link
  11. Current trends in machine learning for surface-enhanced Raman spectroscopy

    Authors
    R. Luo, S. Jiao, J. Nair, A. Ghosh, K. Kamiak, J. Popp, D. Cialla-May, T. Bocklitz
    Year of publication
    Status
    Review pending
    Published in:
    Analyst
Pagination Page 1