Every article comparing FTIR, Raman, and NIR spectroscopy starts with the physics. Molecular vibrations, selection rules, Stokes shift, Beer-Lambert law. That information matters, but it is not what determines whether your point-of-care diagnostic actually ships.
What determines that is the integration work:
- How fast does the instrument acquire a spectrum?
- What format does the data come out in?
- How painful is the preprocessing pipeline?
- How much labeled training data do you need before your ML model is clinically useful?
- What does the instrument control API look like, and does the vendor actually document it?
This article compares FTIR, Raman, and NIR from the perspective of the software engineer building the clinical workflow around the spectrometer. We have integrated all three modalities into the SpectraDx platform, and the differences that matter in practice are not always the differences you read about in spectroscopy textbooks.
The three modalities in thirty seconds
FTIR (Fourier Transform Infrared) measures how a sample absorbs mid-infrared light. Different molecular bonds absorb at different frequencies, producing a spectral fingerprint. FTIR has been the workhorse of analytical chemistry for decades. In clinical diagnostics, it is used for bacterial identification (Bruker IR Biotyper), tissue pathology, and fluid analysis. The sample typically contacts an ATR (Attenuated Total Reflectance) crystal, making it a contact measurement.
Raman spectroscopy measures inelastic scattering of monochromatic light (usually a 532nm or 785nm laser). When photons interact with molecular bonds, a small fraction scatter at shifted frequencies that correspond to vibrational modes. Raman is complementary to FTIR - bonds that are strong in FTIR are often weak in Raman and vice versa. The critical advantage: Raman can measure through transparent packaging, through water, and at a distance. No sample contact required.
NIR (Near-Infrared) measures absorption in the near-infrared region (800-2500nm). NIR spectra contain overtones and combination bands of the fundamental vibrations that FTIR measures. The spectra are broader and less specific than FTIR, but NIR penetrates deeper into samples, acquires faster, and the instrumentation is cheaper. NIR dominates in pharmaceutical quality control, food and agriculture analysis, and bulk material screening.
The comparison table
This is the reference table. Bookmark it.
| Factor | FTIR | Raman | NIR |
|---|---|---|---|
| Acquisition time | 5-30 sec | 1-60 sec | 1-5 sec |
| Sample preparation | ATR crystal contact | None (non-contact possible) | Minimal to none |
| Spectral range | 400-4000 cm-1 | 200-3500 cm-1 | 800-2500 nm (4000-12500 cm-1) |
| Spectral resolution | 0.5-4 cm-1 typical | 1-10 cm-1 typical | 5-20 cm-1 typical |
| Data file size per spectrum | 50-500 KB | 50-500 KB | 10-100 KB |
| Water interference | High (strong O-H absorption) | Low | Moderate |
| Instrument cost (benchtop) | $25K-80K | $30K-150K | $15K-50K |
| Instrument cost (handheld/portable) | $15K-40K | $5K-30K | $5K-20K |
| Key vendors | Bruker, Thermo Fisher, Agilent | Horiba, Renishaw, Wasatch Photonics, Ocean Insight | Bruker, FOSS, Metrohm, Si-Ware |
| Control protocol | DDE, COM, OPC-UA | RS-232, USB HID, vendor SDK | RS-232, Modbus, vendor SDK |
| Native software | OPUS (Bruker), OMNIC (Thermo) | LabSpec (Horiba), WiRE (Renishaw), Enlighten (Wasatch) | OPUS (Bruker), NIRS Pilot (Metrohm) |
| Common data formats | OPUS (.0/.1), SPA, JCAMP-DX | SPC, CSV, JCAMP-DX | OPUS, CSV, JCAMP-DX |
| FDA-cleared clinical devices | Bruker IR Biotyper (CE-IVD, not FDA) | None for clinical diagnostics | None for clinical diagnostics |
| ML model complexity | Medium | High | Low-Medium |
| Preprocessing difficulty | Medium | High (fluorescence) | Low |
| Transfer learning feasibility | Good | Moderate | Challenging |
A few things jump out. NIR is the fastest and cheapest. Raman is the most flexible in terms of sampling. FTIR has the most mature clinical ecosystem. None of them have FDA-cleared clinical devices in the US (the IR Biotyper has CE-IVD marking in Europe but not FDA clearance), which means any clinical deployment requires your own regulatory pathway.
Integration complexity by modality
FTIR integration
FTIR instruments are the most straightforward to integrate, largely because the technology is mature and the vendor ecosystems are well-established.
Instrument control. Bruker FTIR instruments expose control through DDE (Dynamic Data Exchange) on Windows, accessible via the brukeropus Python library or direct COM automation. Thermo Fisher instruments use OMNIC's COM interface. Both are Windows-only, which constrains your deployment architecture - your acquisition workstation will be a Windows machine regardless of what your backend runs on.
A typical acquisition sequence:
# Pseudocode for Bruker FTIR acquisition via OPUS DDE
# 1. Load measurement parameters (resolution, scan count, range)
# 2. Trigger background measurement (empty ATR crystal)
# 3. Wait for operator to place sample on ATR crystal
# 4. Trigger sample measurement
# 5. OPUS performs Fourier transform internally
# 6. Export absorbance spectrum as OPUS file or CSVSample handling considerations. ATR-FTIR requires the sample to make good optical contact with the crystal. In a clinical setting, this means:
- The crystal must be cleaned between every sample (isopropanol wipe, followed by a background measurement)
- Liquid samples need to fully cover the crystal surface
- Solid samples require consistent pressure (most ATR accessories have a pressure gauge)
- The background measurement drifts over time - you need to re-background every 15-30 minutes or after any temperature change
Your software must enforce this workflow. If a clinician skips the background measurement, the resulting spectrum is useless. Build the background check into your acquisition state machine - do not allow a sample measurement without a recent, valid background.
Water interference. This is FTIR's biggest limitation for biological samples. Water has strong, broad absorption bands centered at 1640 cm-1 and 3300 cm-1 that can completely obscure analyte signals. For aqueous clinical samples, you need one of the following:
- ATR with very short path length (diamond ATR, ~2 microns effective path)
- Spectral subtraction of a water reference
- Drying protocols to remove water before measurement
Your preprocessing pipeline must handle water subtraction robustly, or your classifier will learn water features instead of pathogen features.
Data formats. Bruker's OPUS format is proprietary and binary. Use the brukeropus or opusFC Python libraries to read OPUS files, or configure OPUS to auto-export as JCAMP-DX or CSV. Thermo's SPA format is similarly proprietary. Do not build your data pipeline around proprietary formats - convert to an open format (JCAMP-DX, CSV, or your own schema) at the point of acquisition.
Raman integration
Raman is the most interesting modality from a clinical perspective - non-contact, works through packaging, minimal sample prep - and the most challenging from an integration perspective.
Instrument control. The Raman vendor landscape is more fragmented than FTIR. Horiba's LabSpec SDK is available but documentation is limited and licensing terms can be restrictive. Renishaw's WiRE software is powerful but deeply proprietary. The bright spot is Wasatch Photonics, whose Enlighten software is open-source (Python, available on GitHub) and whose spectrometers expose a clean USB HID interface. Ocean Insight instruments also have well-documented APIs.
If you are choosing a Raman spectrometer for a new clinical product, prioritize API accessibility. The technically superior spectrometer with a locked-down SDK will cost you more in integration time than a slightly less capable instrument with open interfaces.
The fluorescence problem. This is the defining software challenge for clinical Raman spectroscopy. When you illuminate a biological sample with a laser, fluorescence from organic molecules generates a broad, intense background signal that can be 10-1000x stronger than the Raman signal itself. If you do not remove this baseline, your ML model will classify fluorescence backgrounds, not molecular fingerprints.
Common baseline correction approaches, in order of increasing sophistication:
-
Polynomial fitting - fit a 5th-7th order polynomial to non-peak regions and subtract. Simple, fast, and surprisingly effective for many samples. Fails when the fluorescence background has sharp features.
-
Asymmetric least squares (ALS) - iteratively fits a smooth baseline that stays below the signal. The
pybaselinesPython library implements this well. Better than polynomial fitting for complex backgrounds. -
SNIP (Statistics-sensitive Non-linear Iterative Peak-clipping) - a peak-stripping algorithm from nuclear physics. Robust and parameter-light. Good default choice.
-
Deep learning baseline estimation - train a neural network to predict the fluorescence background from the raw spectrum. Requires a large training set of raw/corrected spectrum pairs. Best performance but highest development cost.
Your choice of baseline correction method directly affects your classifier's performance. We have found that ALS with optimized parameters provides the best balance of robustness and computational cost for clinical Raman spectra.
Cosmic ray removal. High-energy cosmic rays occasionally strike the CCD detector, producing sharp, narrow spikes in the spectrum that look nothing like Raman peaks but will confuse a classifier. Detection is straightforward (any peak narrower than 2-3 pixels with intensity more than 5 standard deviations above the local baseline is a cosmic ray), but you must implement it - one uncaught cosmic ray spike in your training data will degrade model performance.
Laser power management. Raman instruments use lasers ranging from 5mW to 500mW. Higher power means stronger signal but also risk of sample damage (burning biological tissue) and increased fluorescence. Your acquisition software should allow power optimization per sample type, and you should log the laser power with every spectrum for reproducibility.
Exposure time optimization. Raman signal intensity scales linearly with exposure time. Longer exposures give better signal-to-noise but slow throughput. For point-of-care applications, you want the shortest exposure that gives adequate SNR for your classifier. This is a parameter you will tune during validation - typically 1-10 seconds for 785nm excitation on biological samples, longer for weaker scatterers.
NIR integration
NIR is the easiest modality to integrate from a software perspective, which is why it dominates in process analytical technology (PAT) and at-line quality control.
Instrument control. NIR instruments frequently use standard serial protocols (RS-232, Modbus RTU) or simple USB serial interfaces. Many handheld NIR devices (Texas Instruments DLP NIRscan, Si-Ware NeoSpectra) provide straightforward SDK access with well-documented command sets. The Si-Ware NeoSpectra Micro is particularly integration-friendly - it is a single-chip MEMS-based FT-NIR spectrometer with a clean Bluetooth or USB API.
Preprocessing simplicity. NIR spectra generally require less preprocessing than FTIR or Raman:
- No fluorescence baseline to remove (unlike Raman)
- No water subtraction issues as severe as FTIR (the NIR water bands are broader and more manageable)
- Standard preprocessing: SNV (Standard Normal Variate) normalization, Savitzky-Golay smoothing, first or second derivative transformation
- Most NIR chemometrics can be done with scikit-learn and a standard preprocessing pipeline
The specificity trade-off. NIR spectra contain overtones and combination bands - they are inherently broader and less specific than mid-IR or Raman spectra. This means:
- You need more training samples to achieve the same classification accuracy
- Distinguishing between closely related analytes (e.g., different bacterial species) is harder
- NIR excels at binary or few-class problems (positive/negative, compliant/non-compliant) rather than fine-grained multi-class identification
Temperature sensitivity. NIR spectra are more affected by sample temperature than FTIR or Raman. Water's NIR absorption shifts with temperature, and since biological samples are largely water, temperature variations between samples can introduce systematic error. Your acquisition protocol should either control temperature (tempering station) or record it and include it as a model input. At minimum, log the ambient temperature with every spectrum.
Instrument-to-instrument transfer. This is NIR's biggest headache for scaled deployments. Two NIR spectrometers of the same model will produce slightly different spectra for the same sample due to manufacturing tolerances in the optical components. A model trained on instrument A will perform worse on instrument B without transfer calibration. Approaches include:
- Piecewise Direct Standardization (PDS) using a set of transfer standards
- Domain adaptation in your ML model
- Standardized reference materials measured on each instrument
FTIR instruments generally have better inter-instrument reproducibility. Raman instruments fall somewhere in between, depending on the excitation laser wavelength stability.
ML model considerations
The choice of modality has direct implications for your machine learning pipeline.
FTIR: the established path
FTIR has the most mature spectral library ecosystem. Bruker's IR Biotyper reference library contains thousands of validated microbial spectra. Academic databases (RRUFF for minerals, SDBS for organic compounds) provide additional training data.
For classification tasks, traditional chemometric methods often work well:
- PCA-LDA (Principal Component Analysis followed by Linear Discriminant Analysis) - fast, interpretable, and effective for well-separated classes
- SVM (Support Vector Machine) with RBF kernel - strong performance on FTIR spectra with modest training sets (50-100 spectra per class)
- Random Forest - robust to preprocessing variations, good for multi-class problems
Deep learning (1D-CNNs, attention-based models) improves performance on large datasets but is often unnecessary for FTIR classification problems where the spectral features are well-defined.
Standard preprocessing pipeline for FTIR classification:
# FTIR preprocessing pipeline
# 1. Baseline correction (rubberband or concave hull)
# 2. ATR correction (if comparing to transmission libraries)
# 3. Spectral range selection (typically 800-1800 cm-1
# for biological samples - the fingerprint region)
# 4. Normalization (vector normalization or min-max)
# 5. Optional: second derivative (Savitzky-Golay, 9-point window)
# - enhances peak resolution but amplifies noiseRaman: the hard problem
Raman classification is harder than FTIR classification for three reasons:
-
Fluorescence variability. The fluorescence background varies between samples, between measurement positions on the same sample, and even between repeated measurements at the same position (photobleaching). Your baseline correction must be robust to this variability, or your model learns background shapes rather than Raman features.
-
Lower signal-to-noise. Raman scattering is inherently weak. Clinical samples measured at safe laser powers often have SNR of 10-50, compared to 100-1000 for FTIR. This means your model must be more noise-tolerant.
-
SERS complexity. Surface-Enhanced Raman Spectroscopy (SERS) uses metal nanostructures to amplify the Raman signal by factors of 10^6 or more. SERS makes clinical detection possible for trace analytes, but the enhancement is highly dependent on the nanostructure geometry, the analyte's orientation on the surface, and the local electromagnetic field distribution. SERS spectra from the same analyte can vary significantly between substrates, batches of substrates, and measurement spots. Your model must account for this variability.
For Raman classification, we have found that 1D convolutional neural networks consistently outperform traditional chemometric methods, particularly when fluorescence backgrounds are variable. The CNN learns to ignore the slowly varying baseline and focus on sharp Raman peaks. Architecture:
Input: preprocessed spectrum (e.g., 1024 wavenumber points)
-> Conv1D(32 filters, kernel_size=7) -> ReLU -> BatchNorm
-> Conv1D(64 filters, kernel_size=5) -> ReLU -> BatchNorm -> MaxPool
-> Conv1D(128 filters, kernel_size=3) -> ReLU -> BatchNorm -> MaxPool
-> GlobalAveragePooling
-> Dense(64) -> Dropout(0.3)
-> Dense(num_classes) -> Softmax
Training typically requires 200-500 spectra per class for robust Raman classification, compared to 50-100 for FTIR.
NIR: the chemometrics heritage
NIR spectroscopy has the longest history of multivariate calibration, and the established methods remain effective:
- PLS (Partial Least Squares) regression - the default for quantitative NIR analysis. If you are predicting a concentration (glucose, hemoglobin, drug content), start with PLS.
- PLS-DA (PLS Discriminant Analysis) - the classification equivalent. Works well for binary and few-class problems.
- PCA for exploratory analysis and outlier detection.
NIR models tend to require larger training sets (500+ samples) because the spectral features are broader and the between-class differences are smaller. The upside is that NIR preprocessing is well-standardized and the models train quickly.
The critical challenge for NIR in clinical point-of-care is the instrument transfer problem discussed above. A model validated on one instrument must work on all deployed instruments. Budget significant time for transfer calibration during your deployment planning.
Instrument control APIs - a practical comparison
Here is what you are actually dealing with when you write the acquisition software.
Bruker FTIR (OPUS DDE)
- Protocol: Windows DDE (Dynamic Data Exchange)
- Python access:
brukeropuslibrary or win32 COM - Strengths: Mature, well-documented, reliable
- Weaknesses: Windows-only, DDE is a legacy protocol, OPUS must be running as a GUI application
- Typical integration time: 2-4 weeks
Horiba Raman (LabSpec SDK)
- Protocol: Vendor SDK (C++/.NET)
- Python access: ctypes wrapper or .NET interop
- Strengths: Full instrument control, hardware-level access
- Weaknesses: Limited documentation, licensing restrictions, SDK updates may lag instrument firmware
- Typical integration time: 4-8 weeks
Wasatch Photonics Raman (Enlighten)
- Protocol: USB HID + open-source Python driver
- Python access:
wasatch.pylibrary (GitHub, open source) - Strengths: Fully open, well-documented, active development
- Weaknesses: Fewer instrument configurations than Horiba/Renishaw
- Typical integration time: 1-2 weeks
Si-Ware NeoSpectra NIR
- Protocol: USB/Bluetooth + vendor SDK
- Python access: Vendor Python SDK
- Strengths: Compact (single-chip MEMS), clean API, low cost
- Weaknesses: Lower resolution than benchtop NIR, limited spectral range
- Typical integration time: 1-2 weeks
Integration time estimates assume a competent developer with spectroscopy domain knowledge. Add 2-4x if this is your team's first spectrometer integration.
Which modality should you choose?
This is the question every spectroscopy startup asks, and the answer depends on your specific application. But here are the decision rules we have arrived at after integrating all three.
Choose FTIR if:
- You need molecular fingerprinting - detailed identification of specific compounds or organisms
- Your application is in a lab or clinic setting (not field-portable)
- You want the most mature software and reference library ecosystem
- Your samples can tolerate contact measurement on an ATR crystal
- You are building for bacterial identification, tissue pathology, or fluid analysis
Choose Raman if:
- You need non-contact or through-packaging measurement
- Handheld portability matters (field diagnostics, point-of-care in low-resource settings)
- Your application involves aqueous samples where FTIR water interference is a problem
- You are pursuing SERS-based detection for trace analytes (drugs of abuse, biomarkers)
- You have the engineering resources to handle fluorescence preprocessing
Choose NIR if:
- Acquisition speed is critical (high-throughput screening, triage)
- You are doing bulk material classification or quantitative analysis
- Cost per instrument is a primary constraint
- Your classification problem is binary or few-class (not fine-grained identification)
- You are scaling to many deployed instruments (NIR hardware is cheapest to replicate)
Or: build modality-agnostic.
This is the approach we took with SpectraDx. The clinical workflow - patient identification, specimen tracking, result review, HL7 output, billing integration - is identical regardless of whether the underlying measurement is FTIR, Raman, or NIR. The spectrometer is a sensor. The software platform should abstract over the sensor.
In practice, this means:
- A common spectrum data model that normalizes across formats (OPUS, SPC, CSV, JCAMP-DX)
- A plugin architecture for instrument control (each spectrometer type implements an acquisition interface)
- Modality-specific preprocessing pipelines that feed into a common feature space
- ML models that are trained per modality but deployed through a common inference API
- Result reporting that is identical regardless of source modality
This architecture lets a hospital deploy FTIR for microbiology, Raman for rapid drug screening, and NIR for fluid triage - all through a single platform, a single clinical workflow, and a single LIS integration.
Total cost of deployment
Beyond the instrument itself, here is what a realistic deployment budget looks like for each modality. These are the costs that do not appear on the spectrometer vendor's quote.
| Cost category | FTIR | Raman | NIR |
|---|---|---|---|
| Instrument (benchtop) | $25K-80K | $30K-150K | $15K-50K |
| Instrument (handheld) | $15K-40K | $5K-30K | $5K-20K |
| Software integration | 2-4 months eng. time | 4-8 months eng. time | 1-3 months eng. time |
| Training data collection | 500-2000 spectra | 1000-5000 spectra | 2000-10000 spectra |
| Consumables (annual) | $2K-5K (ATR crystals, cleaning) | $1K-3K (laser replacement, cal. standards) | $500-2K (reference standards) |
| Calibration/validation | Quarterly, 1-2 days | Monthly (laser wavelength), 1 day | Quarterly + transfer cal., 2-3 days |
| Regulatory (FDA 510(k) or De Novo) | $300K-1M+ | $300K-1M+ | $300K-1M+ |
The regulatory cost dominates everything else. It is the same regardless of modality, and it is the reason most spectroscopy-based diagnostic startups fail - not because the science does not work, but because the regulatory pathway takes longer and costs more than they planned for.
Where to go from here
If you are building spectroscopy-based diagnostics and need to support one or more of these modalities, the integration work is substantial but well-defined. The key is to abstract the modality-specific details behind clean interfaces so that the clinical workflow, regulatory documentation, and LIS integration do not have to change when you add a new spectrometer.
For instrument-specific integration guidance, see our deep dive on Bruker OPUS integration. For information on connecting Raman instruments to clinical workflows, see Raman Spectroscopy in Clinical Diagnostics. For the HL7 output side, see HL7v2 for Spectroscopy Instruments.
SpectraDx supports FTIR, Raman, and NIR under one platform - instrument control, preprocessing, ML classification, clinical workflow, and EHR integration. See our solutions overview for the full platform architecture. If you are evaluating modalities for a clinical application and want to talk through the trade-offs, reach out.

