SpectraDx
pectraDx
SpectraDxlight to insight
← Back to blog
Architecture

Spectral Data Formats: JCAMP-DX, SPC, and ANDI Compared

Spectral data formats compared: JCAMP-DX, SPC, ANDI/netCDF, and vendor-specific binaries. Parsing code, support matrices, and a format decision framework.

Spectral Data Formats: JCAMP-DX, SPC, and ANDI Compared

Every spectrometer manufacturer has invented their own file format. Bruker stores FTIR spectra in proprietary binary .0 files. Thermo Fisher uses the SPC format from the Galactic Industries acquisition. Horiba wraps Raman data in XML. Renishaw packs spectral maps into WDF binaries. Each format captures the same fundamental thing - an array of x-values paired with an array of y-values, plus metadata - but they are all incompatible with each other.

This creates a concrete engineering problem. If you are building software that processes spectra from multiple instruments - a clinical workflow platform, a multi-site data pipeline, a machine learning training system - you need to read all of them. And if you need to exchange data between systems, you need a common interchange format.

This article is the reference you bookmark. It covers every major spectral data format:

  • Vendor-neutral interchange standards: JCAMP-DX, SPC, ANDI/netCDF
  • Vendor-specific proprietary formats: Bruker OPUS, Renishaw WDF, Horiba LabSpec
  • Plain CSV as the universal fallback

For each, you get the file structure, parsing code in Python, gotchas, and when to use it. At the end, a decision framework for choosing the right format for your use case.

The Format Landscape

Before diving into specifics, here is the taxonomy:

FormatExtensionTypeVendorStandardized?Multi-spectrum?
JCAMP-DX.dx, .jdxTextNeutralYes (IUPAC)Yes (NTUPLES, LINK)
SPC.spcBinaryThermo/GalacticDe factoYes (subfiles)
ANDI/netCDF.cdfBinaryNeutralYes (ASTM)Yes (variables)
Bruker OPUS.0, .1, .2BinaryBrukerNoYes (data blocks)
Renishaw WDF.wdfBinaryRenishawNoYes (spectral maps)
Horiba LabSpec.l5s, .l6s, .ngsBinaryHoribaNoYes
CSV/ASCII.csv, .txtTextNeutralNoBy convention

The standardized formats were designed for interchange. The proprietary formats were designed for the vendor's own software. You will encounter all of them.

JCAMP-DX: The IUPAC Standard

JCAMP-DX (Joint Committee on Atomic and Molecular Physical Data - Data Exchange) is the only internationally standardized format for chemical spectral data. Published by IUPAC, it has been revised multiple times since its introduction in 1988, with the current version at 6.0.

File Structure

JCAMP-DX is a plain text format. Every file consists of labeled data records (LDRs), each starting with ## and a label name:

##TITLE= Polystyrene ATR-FTIR Spectrum
##JCAMP-DX= 5.01
##DATA TYPE= INFRARED SPECTRUM
##ORIGIN= Bruker OPUS 9.3
##OWNER= SpectraDx Lab
##XUNITS= 1/CM
##YUNITS= ABSORBANCE
##FIRSTX= 399.2308
##LASTX= 3999.6401
##DELTAX= 0.9643
##NPOINTS= 3734
##XFACTOR= 1.000000
##YFACTOR= 0.000001
##XYDATA= (X++(Y..Y))
399.2  1823456 1834567 1845678 1856789 1867890
404.1  1878901 1889012 1900123 1911234 1922345
...
##END=

The header records define the spectral metadata: data type (infrared, Raman, NIR, NMR, UV-Vis, mass spec), axis units, spectral range, and number of points. The data section contains the actual spectral values.

Data Encoding Schemes

JCAMP-DX was designed when disk space was expensive, so it includes several compression schemes for the ##XYDATA block:

AFFN (ASCII Free Format Numeric) - the simplest encoding. X and Y values as plain numbers separated by whitespace. Easy to parse, large files:

##XYDATA= (XY..XY)
399.23  0.182345
400.19  0.183456
401.16  0.184567

PAC (Packed) - Y values are encoded as integers by dividing by YFACTOR. Saves space. X values are implicit (calculated from FIRSTX + n × DELTAX):

##XYDATA= (X++(Y..Y))
399.2  1823456 1834567 1845678

SQZ (Squeezed) - Digits 0-9 are replaced with characters @ABCDEFGHI, with + and - signs encoded into the digit character using uppercase vs lowercase. Saves about 30% over PAC:

##XYDATA= (X++(Y..Y))
399.2 @BhCDefG ABcDeFg ...

DIF (Difference) - Each value is stored as the difference from the previous value. Spectral data is usually smooth, so differences are small numbers, making for excellent compression:

##XYDATA= (X++(Y..Y))
399.2 1823456 j1111 j1111 j1111 j1111

DIFDUP (Difference + Duplicate) - Combines DIF with run-length encoding for repeated differences. The most compact encoding and the most common in practice.

NTUPLES - An extension for multi-dimensional data (e.g., a spectrum with both real and imaginary components, or a series of spectra at different time points):

##NTUPLES= INFRARED SPECTRUM
##VAR_NAME= FREQUENCY, ABSORBANCE_REAL, ABSORBANCE_IMAG
##SYMBOL= X, Y, R
##VAR_TYPE= INDEPENDENT, DEPENDENT, DEPENDENT
##VAR_DIM= 3734, 3734, 3734
##UNITS= 1/CM, ABSORBANCE, ABSORBANCE
##PAGE= N=1
##DATA TABLE= (X++(Y..Y)), XYDATA
399.2  1823456 1834567 1845678
...
##END NTUPLES=

Parsing JCAMP-DX in Python

The jcamp library handles all encoding schemes:

import jcamp
import numpy as np
 
# Read a JCAMP-DX file
data = jcamp.jcamp_readfile('spectrum.dx')
 
# Access spectral data
wavenumbers = data['x']    # numpy array
absorbance = data['y']     # numpy array
 
# Access metadata
title = data.get('title', '')
data_type = data.get('data type', '')
origin = data.get('origin', '')
xunits = data.get('xunits', '')
yunits = data.get('yunits', '')
 
print(f"Title: {title}")
print(f"Type: {data_type}")
print(f"Range: {wavenumbers[0]:.1f} - {wavenumbers[-1]:.1f} {xunits}")
print(f"Points: {len(wavenumbers)}")

For files with NTUPLES (multi-page spectra):

# jcamp handles NTUPLES transparently
# but for multi-block files, you may need to iterate
data = jcamp.jcamp_readfile('multi_spectrum.jdx')
 
# If the file contains linked blocks:
if 'children' in data:
    for i, child in enumerate(data['children']):
        x = child['x']
        y = child['y']
        print(f"Block {i}: {len(x)} points")

Install with pip install jcamp.

Gotchas

  • Encoding detection. The library must auto-detect which compression scheme is used. Older or non-standard JCAMP files sometimes mix schemes or use non-standard labels. If jcamp.jcamp_readfile() fails, try jcamp.jcamp_read() on the raw string.
  • Character encoding. JCAMP-DX files are nominally ASCII, but you will encounter Latin-1 and UTF-8 in metadata fields. Read with errors='replace' as a fallback.
  • Precision loss. SQZ and DIF encoding introduce quantization due to the integer representation. The YFACTOR determines the precision floor. Check that YFACTOR is small enough for your application.
  • Version differences. JCAMP-DX 4.24 (1988) only supports single spectra. Version 5.0 (1993) added NTUPLES for NMR and multi-dimensional data. Version 5.01 (1999) added GLP compliance fields and Y2K date fixes. Active development halted around 2006 - an XML replacement was proposed but never materialized. Most files in the wild are version 4.24 or 5.01.

SPC: The De Facto Binary Standard

The SPC format originated at Galactic Industries in the 1990s. When Thermo Fisher acquired Galactic (via the Nicolet and Mattson lineage), SPC became Thermo's standard format. Because Thermo Fisher is the largest instrument vendor, SPC became the de facto interchange format - most spectroscopy software can read it.

File Structure

SPC is a fixed-header binary format. The main header occupies the first 512 bytes and defines the file type, data dimensions, and encoding:

Offset  Size   Field           Description
------  ----   -----           -----------
0       1      ftflgs          File type flags (bit field)
1       1      fversn          Version: 0x4B (old) or 0x4D (new)
2       1      fexper          Experiment type (IR, Raman, UV, etc.)
3       1      fexp            Exponent for Y values (2^fexp scaling)
4       4      fnpts           Number of points per spectrum
8       8      ffirst          First X value (double)
16      8      flast           Last X value (double)
24      4      fnsub           Number of subfiles (spectra in file)
...
512+    varies  Subfile data    Actual spectral data

The ftflgs byte encodes critical information as individual bits:

BitMeaning when set
0Y values are 16-bit integers (vs. 32-bit float)
1Experiment extension exists
2Multi-file (multiple subfiles)
3Z values in random order
4Z values not even (non-uniform time/parameter spacing)
5Custom axis labels in file
7X values are not evenly spaced (each subfile has its own X array)

When bit 7 is clear (X evenly spaced), the X values are calculated from ffirst, flast, and fnpts. When bit 7 is set, each subfile carries its own X array - doubling the data size but supporting arbitrary X axes.

Subfile Structure

For multi-spectrum files, each subfile (spectrum) follows the header:

Offset  Size   Field           Description
------  ----   -----           -----------
0       1      subflgs         Subfile flags
1       1      subexp          Y exponent for this subfile
2       2      subindx         Subfile index
4       4      subtime         Z value (e.g., time in kinetics)
8       4      subnext         Offset to next subfile
12      4      subnois         Noise level
16      4      subnpts         Number of points (if different from main)
20      4      subscan         Number of co-added scans
24      4      subwlevel       W axis value
28      varies  Y data         Spectral data (float32 or int16)

Parsing SPC in Python

The spc-spectra package reads SPC files:

import spc_spectra as spc
import numpy as np
 
# Read an SPC file
f = spc.File('spectrum.spc')
 
# Single-spectrum file
x = f.x       # wavenumber/wavelength array
y = f.sub[0].y  # intensity/absorbance array
 
print(f"Points: {len(x)}")
print(f"Range: {x[0]:.1f} - {x[-1]:.1f}")
print(f"Experiment type: {f.exp_type}")
 
# Multi-spectrum file (e.g., kinetics series)
for i, sub in enumerate(f.sub):
    print(f"Spectrum {i}: {len(sub.y)} points, "
          f"z-value: {sub.subtime}")
 
# Access log block (text metadata)
if f.log_dict:
    for key, value in f.log_dict.items():
        print(f"  {key}: {value}")

Install with pip install spc-spectra.

The Log Block

SPC files include an optional log block - a free-form text section after the spectral data. Instrument software uses this to store metadata that does not fit the fixed header: sample name, operator, date, instrument serial number, and acquisition parameters. The log block has its own 64-byte header specifying its offset and size.

# Access the raw log text
if hasattr(f, 'log_content'):
    print(f.log_content)
    # Typical output:
    # DATE=05/08/2026
    # TIME=14:25:00
    # OPERATOR=J. Smith
    # RESOLUTION=4 cm-1
    # SCANS=32

Gotchas

  • Old vs. new format. Files with fversn = 0x4B use the old Galactic format (pre-1996). Files with fversn = 0x4D use the new format. The spc-spectra library handles both, but some older files may have quirks.
  • Y scaling. Y values may be stored as 32-bit floats or as integers scaled by 2^fexp. Check ftflgs bit 0 and fexp to interpret correctly.
  • Byte order. SPC files are little-endian. This is only relevant if you are writing your own parser on a big-endian system (rare today).
  • Thermo-specific extensions. Newer Thermo Fisher instruments may write SPC files with proprietary extensions in the log block. These are generally safe to ignore.

ANDI/netCDF: The Self-Describing Format

ANDI (Analytical Data Interchange) uses the netCDF (Network Common Data Form) container format. Originally standardized by ASTM as E1947 for chromatographic data, it has been adapted for spectroscopic data as well.

netCDF is a self-describing binary format - the file includes its own schema (dimensions, variables, attributes) alongside the data. This makes it robust against format evolution: a reader can discover what the file contains without knowing the exact version.

Structure

A netCDF spectral file typically contains:

  • Dimensions: point_number (number of data points), scan_number (for multi-spectrum), string_length (for text attributes)
  • Variables: ordinate_values (Y data), abscissa_values (X data if not uniform), detector_name, scan_acquisition_time
  • Global attributes: dataset_origin, experiment_type, operator_name, detector_unit, ordinate_unit

Parsing ANDI/netCDF in Python

Use scipy.io.netcdf for older ANDI files or netCDF4 for modern netCDF-4 files:

from scipy.io import netcdf_file
import numpy as np
 
# Read an ANDI/netCDF spectral file
with netcdf_file('spectrum.cdf', 'r') as f:
    # List all variables
    print("Variables:", list(f.variables.keys()))
 
    # Access spectral data
    y = f.variables['ordinate_values'][:].copy()
 
    # X values may be stored explicitly or calculated
    if 'abscissa_values' in f.variables:
        x = f.variables['abscissa_values'][:].copy()
    else:
        # Calculate from attributes
        first_x = f.variables['ordinate_values'].first_x
        last_x = f.variables['ordinate_values'].last_x
        x = np.linspace(first_x, last_x, len(y))
 
    # Access metadata
    print(f"Origin: {getattr(f, 'dataset_origin', 'Unknown')}")
    print(f"Type: {getattr(f, 'experiment_type', 'Unknown')}")
    print(f"Points: {len(x)}")
    print(f"Range: {x[0]:.1f} - {x[-1]:.1f}")

For netCDF-4 format files (newer instruments):

import netCDF4
 
ds = netCDF4.Dataset('spectrum.cdf', 'r')
 
# List dimensions and variables
print("Dimensions:", dict(ds.dimensions))
print("Variables:", list(ds.variables.keys()))
 
# Access data
y = ds.variables['ordinate_values'][:]
x = ds.variables['abscissa_values'][:]
 
ds.close()

Install netCDF4 with pip install netCDF4. The scipy.io.netcdf module is included with scipy and handles classic netCDF (v3) files - sufficient for most ANDI files.

Gotchas

  • Variable naming. ANDI files from different vendors use different variable names for the same data. Agilent uses ordinate_values; some older instruments use intensity_values or signal. Inspect the file before hardcoding variable names.
  • Chromatographic vs. spectroscopic. The ASTM E1947 standard was designed for chromatography. Spectroscopy ANDI files adapt the schema, but the fit is not perfect. Multi-spectrum files (e.g., hyphenated GC-IR data) use the scan_number dimension.
  • netCDF versions. Classic netCDF (v3) and netCDF-4 (based on HDF5) have different internal structures. scipy.io.netcdf reads only v3. netCDF4 reads both.

Bruker OPUS: The Proprietary Binary

Bruker's OPUS format is the native file format for all Bruker FTIR instruments. Files have numeric extensions (.0, .1, .2, incrementing with each measurement) and are proprietary binary with no published specification.

We cover OPUS in detail in our Bruker OPUS interfaces guide and Python FTIR automation tutorial. Here is the format summary relevant to data interchange.

Structure

An OPUS file is a container of typed data blocks, each with a header specifying the block type, offset, and size. Known block types include:

Block TypeContentTypical Key
Absorbance spectrumProcessed absorbance dataa
Transmittance spectrumProcessed transmittancet
Reflectance spectrumProcessed reflectancer
Sample interferogramRaw detector signaligsm
Reference interferogramBackground detector signaligrf
Sample phase spectrumPhase correction dataphsm
Reference phase spectrumBackground phasephrf
Instrument parametersAcquisition settingsparams
Optic parametersOptical configurationoptic
Sample parametersSample identificationsample

Each data block contains a header with the first X value, last X value, number of points, and a scaling factor, followed by the raw data as 32-bit floats.

Parsing in Python

Three libraries handle OPUS files, each with different trade-offs:

brukeropus - the most complete:

from brukeropus import read_opus
 
opus = read_opus('sample.0')
 
# List available data blocks
print(opus.data_keys)  # e.g., ['a', 't', 'igsm', 'igrf']
 
# Access absorbance spectrum
x = opus.a.x    # numpy array of wavenumbers
y = opus.a.y    # numpy array of absorbance values
 
# Access all parameters
print(opus.params)
# {'INS': 'Alpha II', 'SRC': 'Internal', 'RES': '4', ...}

brukeropusreader - lighter weight, read-only:

from brukeropusreader import read_file
 
data = read_file('sample.0')
absorbance = data["AB"]   # numpy array

opusFC - C-based, fastest for batch processing:

import opusFC
 
blocks = opusFC.listContents('sample.0')
for b in blocks:
    print(f"Type: {b.blocktype}, Points: {b.npt}")
 
data = opusFC.getOpusData('sample.0', blocks[0])
x, y = data.x, data.y

Install with pip install brukeropus, pip install brukeropusreader, or pip install opusFC respectively.

Gotchas

  • No specification. The format is reverse-engineered. Each Python library handles a slightly different subset of OPUS versions and block types. If one library fails on a particular file, try another.
  • File locking. OPUS locks files during acquisition. If you try to read a .0 file while OPUS is still writing it, you get corrupted data or a permission error. See the retry logic in our Python FTIR tutorial.
  • Numeric extensions. The extension is not a file type indicator - it is a sequence counter. .0 is the first measurement, .1 is the second. The actual format is identical regardless of extension.
  • Export alternative. If you control the instrument workflow, exporting to JCAMP-DX (.dx) from OPUS is sometimes simpler than parsing the native format. OPUS supports export to JCAMP-DX, SPC, CSV, and several other formats.

Renishaw WDF: Raman Spectral Maps

Renishaw's WiRE software stores Raman spectra in the WDF (WiRE Data Format) binary format. WDF files can contain single spectra, line scans, or full 2D spectral maps - a grid of spatial positions, each with a complete Raman spectrum.

Structure

The WDF file begins with a fixed header block, followed by a series of tagged data blocks. Each block has a 16-byte header with a 4-byte block type identifier and a 4-byte block size. Major block types:

Block Type IDContent
WDF1Main header (version, measurement type, point counts)
DATASpectral intensity data (float32 array)
XLSTX-axis list (wavenumber/wavelength values)
YLSTY-axis list (secondary axis values)
ORGNOrigin list (spatial coordinates for map data)
WMAPMap metadata (dimensions, step sizes)
WHTLWhite-light image (if captured)
TEXTText metadata (sample description, notes)

For spectral maps, the DATA block contains all spectra concatenated: if you have a 10×10 map with 1024 points per spectrum, the DATA block contains 100 × 1024 = 102,400 float32 values.

Parsing in Python

The renishawWiRE library parses WDF files:

from renishawWiRE import WDFReader
import numpy as np
 
# Read a WDF file
reader = WDFReader('raman_map.wdf')
 
# Basic metadata
print(f"Title: {reader.title}")
print(f"Measurement type: {reader.measurement_type}")
print(f"Number of spectra: {reader.count}")
print(f"Points per spectrum: {reader.point_per_spectrum}")
 
# Wavenumber axis (same for all spectra in the map)
wavenumbers = reader.xdata    # numpy array, shape (npoints,)
 
# All spectral data
spectra = reader.spectra      # numpy array, shape (nspectra, npoints)
 
# For a single spectrum file:
single_spectrum = spectra[0]
 
# For a map: access spatial coordinates
if reader.measurement_type == 2:  # Map
    x_coords = reader.xpos    # X positions
    y_coords = reader.ypos    # Y positions
    map_shape = reader.map_shape  # (ny, nx)
 
    # Reshape spectra into a spatial grid
    spectra_map = spectra.reshape(*map_shape, -1)
    # spectra_map[row, col, :] gives the spectrum at that position
 
# Access the white-light image if available
if reader.img is not None:
    img = reader.img          # PIL Image object

Install with pip install renishawWiRE.

Gotchas

  • Map orientation. Spatial coordinates in WDF files follow the instrument stage convention, which may differ from image convention (row 0 at top vs. bottom). Verify orientation against the white-light image.
  • Large files. A 100×100 spectral map with 1024 points per spectrum produces a 40 MB DATA block. Load lazily if memory is a concern.
  • WiRE version differences. Older WiRE versions (< 5.0) write slightly different block structures. The renishawWiRE library handles most versions but may fail on very old files.

Horiba LabSpec: Mixed Binary

Horiba's LabSpec software (for LabRAM and XploRA Raman systems) uses proprietary binary formats with extensions .l5s (LabSpec 5), .l6s (LabSpec 6), and .ngs. These are binary containers, not XML - though LabSpec can export to XML, which is the most accessible format for third-party parsing.

What Is Known

The LabSpec format stores:

  • Spectral data (float32 or float64 arrays)
  • Wavenumber/wavelength axis
  • Acquisition parameters (laser wavelength, exposure time, grating position)
  • Spatial coordinates (for mapping data)
  • Embedded white-light images

Parsing Options

There is no established open-source Python library for native LabSpec files. Practical approaches:

  1. Export from LabSpec software. LabSpec can export to JCAMP-DX, SPC, ASCII/CSV, and several other formats. If you control the instrument workflow, configure LabSpec to auto-export to JCAMP-DX or CSV after each acquisition.
  2. Use Horiba's SDK. Horiba provides an SDK for programmatic access to LabSpec data, but it is Windows-only and requires a LabSpec license.
  3. Parse the binary directly. The format has been partially reverse-engineered. The file begins with a version-dependent header, followed by data blocks that can be located by searching for known magic bytes. This approach is fragile and not recommended unless you have a large corpus of LabSpec files and no other option.
import numpy as np
import struct
 
def read_labspec_spectrum(filepath: str) -> tuple:
    """
    Attempt to read spectral data from a LabSpec file.
    This is a best-effort parser for common LabSpec versions.
    Export to JCAMP-DX or SPC is preferred.
    """
    with open(filepath, 'rb') as f:
        raw = f.read()
 
    # Search for the data block marker (version-dependent)
    # This is inherently fragile - prefer export formats
    # Common pattern: look for a float64 array preceded by
    # a 4-byte count
    raise NotImplementedError(
        "Native LabSpec parsing is fragile. "
        "Export to JCAMP-DX from LabSpec instead: "
        "File > Export > JCAMP-DX"
    )

Recommendation: Do not invest in native LabSpec parsing. Export to JCAMP-DX or SPC from the LabSpec software and parse that instead. The export preserves all spectral data and metadata.

CSV and ASCII: The Universal Fallback

Every spectroscopy instrument can export to CSV or tab-delimited text. It is the lowest common denominator - universally readable, universally lossy.

Common Layouts

Two-column (X, Y):

# Wavenumber (cm-1), Absorbance
3999.64, 0.0234
3998.68, 0.0231
3997.71, 0.0228
...

Multi-column (X, Y1, Y2, ...) for multiple spectra:

Wavenumber, Sample_001, Sample_002, Sample_003
3999.64, 0.0234, 0.0198, 0.0267
3998.68, 0.0231, 0.0195, 0.0264

Header variations. Some instruments write metadata as comment lines (prefixed with #, %, or ;). Some write column headers. Some write nothing - just raw numbers.

Parsing CSV in Python

import numpy as np
import pandas as pd
 
# Simple two-column CSV
data = np.loadtxt('spectrum.csv', delimiter=',', skiprows=1)
wavenumbers = data[:, 0]
absorbance = data[:, 1]
 
# Multi-spectrum CSV with headers
df = pd.read_csv('spectra.csv')
wavenumbers = df.iloc[:, 0].values
spectra = df.iloc[:, 1:].values  # shape: (npoints, nspectra)
 
# Handle comment lines
data = np.loadtxt('spectrum.txt', comments='#', delimiter='\t')

When CSV Works

CSV works when:

  • You are exporting a single spectrum or a small batch for quick analysis
  • You need to open data in Excel or Google Sheets
  • You are exchanging data with a collaborator who does not have spectroscopy software
  • You are feeding data into a generic ML pipeline that consumes tabular data

When CSV Fails

CSV loses metadata. You get the spectral data but not the instrument model, acquisition parameters, resolution, number of scans, sample name, operator, or timestamp. For clinical and regulatory use cases where data provenance matters, CSV is insufficient. Use JCAMP-DX or SPC.

CSV also has no standard schema. Two-column vs. multi-column, comma vs. tab, header vs. no header, comment prefix - every instrument exports slightly different CSV. Your parser needs to be defensive.

Vendor Support Matrix

Which instruments output which formats, and what interchange formats they support:

VendorInstrumentNative FormatExport: JCAMP-DXExport: SPCExport: CSVExport: netCDF
BrukerAlpha II, Vertex, TensorOPUS (.0)YesYesYesNo
Thermo FisherNicolet iS50, iS20SPC (.spc)YesNativeYesYes
Thermo FisherDXR3 (Raman)SPC (.spc)YesNativeYesYes
HoribaLabRAM, XploRALabSpec (.l5s/.l6s)Yes*YesYesHDF5 (v6.3+)
RenishawinVia, VirsaWDF (.wdf)YesYesYesNo
AgilentCary 630 FTIRAgilent (.a2r)YesNoYesNo
PerkinElmerSpectrum Two, FrontierSP (.sp)YesYesYesNo
ShimadzuIRSpirit, IRTracerShimadzu (.spc*)YesVariantYesNo

*Shimadzu uses a modified SPC format that is not fully compatible with Thermo SPC readers.

Key observation: JCAMP-DX is the only format that every vendor supports for export. If you need a single interchange format, JCAMP-DX is the safe choice.

Decision Framework: Which Format Should You Use?

For Data Interchange Between Systems

Use JCAMP-DX. It is the only IUPAC-standardized format, universally supported for export, and text-based (human-inspectable, version-controllable, diffable). The compression encoding makes files smaller than you might expect.

Use SPC as a secondary interchange format if your ecosystem is Thermo-heavy. Avoid netCDF unless you are working in a chromatography-adjacent pipeline that already uses it.

For Archival and Regulatory Compliance

Use the vendor's native format plus JCAMP-DX. Regulatory submissions (FDA, EU IVDR) need raw data in the original instrument format for reproducibility. Store the native .0, .spc, or .wdf file as the primary record and a JCAMP-DX export as the human-readable companion.

For Machine Learning Pipelines

Use NumPy arrays (.npy) or HDF5 (.h5) internally. Parse the source format once, extract the spectral data, preprocess it (baseline correction, normalization), and save the result as a NumPy array or HDF5 dataset. ML frameworks consume arrays, not spectral files.

import numpy as np
 
# Parse once, save as numpy
wavenumbers, spectra, labels = [], [], []
for filepath, label in dataset:
    opus = read_opus(filepath)
    wavenumbers = opus.a.x
    spectra.append(opus.a.y)
    labels.append(label)
 
np.save('wavenumbers.npy', wavenumbers)
np.save('spectra.npy', np.array(spectra))
np.save('labels.npy', np.array(labels))

For Real-Time Instrument Integration

Read the native format directly. Parsing overhead matters when you are processing spectra in a clinical workflow with sub-second latency requirements. The native format avoids the export step and gives you access to all metadata (interferograms, quality metrics, instrument status) that export formats may omit.

Use brukeropus for Bruker, spc-spectra for Thermo, and renishawWiRE for Renishaw. See our Python FTIR automation tutorial for the complete real-time acquisition pipeline.

For Multi-Vendor Platforms

Build a parser adapter per vendor, normalize to a common internal representation. This is what SpectraDx does. Each instrument adapter reads the native format and produces a standardized internal spectrum object (wavenumber array, intensity array, metadata dict). Downstream processing - classification, result delivery, archival - operates on the normalized representation, not the vendor-specific format.

from dataclasses import dataclass
import numpy as np
 
@dataclass
class NormalizedSpectrum:
    wavenumbers: np.ndarray
    intensities: np.ndarray
    metadata: dict
    source_format: str
    source_path: str
 
def parse_any_format(filepath: str) -> NormalizedSpectrum:
    """Route to the correct parser based on file extension."""
    ext = filepath.rsplit('.', 1)[-1].lower()
 
    if ext.isdigit():  # Bruker OPUS (.0, .1, .2)
        from brukeropus import read_opus
        opus = read_opus(filepath)
        return NormalizedSpectrum(
            wavenumbers=opus.a.x,
            intensities=opus.a.y,
            metadata=dict(opus.params),
            source_format='opus',
            source_path=filepath,
        )
 
    elif ext == 'spc':
        import spc_spectra as spc
        f = spc.File(filepath)
        return NormalizedSpectrum(
            wavenumbers=f.x,
            intensities=f.sub[0].y,
            metadata=f.log_dict or {},
            source_format='spc',
            source_path=filepath,
        )
 
    elif ext in ('dx', 'jdx'):
        import jcamp
        data = jcamp.jcamp_readfile(filepath)
        return NormalizedSpectrum(
            wavenumbers=data['x'],
            intensities=data['y'],
            metadata={k: v for k, v in data.items()
                      if k not in ('x', 'y')},
            source_format='jcamp-dx',
            source_path=filepath,
        )
 
    elif ext == 'wdf':
        from renishawWiRE import WDFReader
        reader = WDFReader(filepath)
        return NormalizedSpectrum(
            wavenumbers=reader.xdata,
            intensities=reader.spectra[0],
            metadata={'title': reader.title},
            source_format='wdf',
            source_path=filepath,
        )
 
    elif ext in ('csv', 'txt'):
        data = np.loadtxt(filepath, delimiter=',', skiprows=1)
        return NormalizedSpectrum(
            wavenumbers=data[:, 0],
            intensities=data[:, 1],
            metadata={},
            source_format='csv',
            source_path=filepath,
        )
 
    else:
        raise ValueError(f"Unsupported format: .{ext}")

This adapter pattern is the foundation of instrument-agnostic spectroscopy software and is central to the SpectraDx platform approach. For more on how this fits into a clinical deployment architecture, see Building Clinical Workflow Software for Spectroscopy-Based Diagnostics. For details on moving parsed spectral data into classification models, see Building AI Pipelines for Spectral Classification.

Further Reading

SpectraDx builds clinical workflow software for spectroscopy-based diagnostics.

The layer between the spectrometer and the clinician. Instrument control, patient workflow, ML classification, HL7/FHIR output, and billing — in one platform.

Get articles like this in your inbox.

Monthly technical resources for spectroscopy professionals. No marketing fluff.