You have built a spectroscopy-based diagnostic test. An FTIR system that identifies pathogens from clinical swabs. A Raman device that classifies tissue samples. An NIR instrument that screens blood products. The science works. The ML model classifies with high sensitivity and specificity. Now you need to deploy it in a clinical setting.
Here is the question that will determine your regulatory timeline, your development budget, and your go-to-market strategy:
Is your software a medical device?
The answer is not obvious. It depends on a specific architectural decision - where the clinical interpretation happens. Get this wrong and you either spend 18 months and $200K+ on a regulatory path you did not need, or you deploy without clearance and face an FDA enforcement action. Both outcomes end badly.
This article breaks down the FDA's Software as a Medical Device (SaMD) framework as it applies to spectroscopy-based diagnostics. It is not a substitute for regulatory counsel, but it will give you the technical foundation to have an informed conversation with your regulatory consultant - and potentially save you from an expensive misunderstanding.
The line that matters
The FDA's definition of SaMD comes from the IMDRF (International Medical Device Regulators Forum): software intended to be used for one or more medical purposes that performs those purposes without being part of a hardware medical device.
For spectroscopy diagnostics, this definition creates a bright line based on one question: who or what makes the clinical interpretation?
Scenario A: The instrument provides the result
Your spectrometer - a Bruker Alpha II running OPUS, say - acquires a spectrum and runs it against a spectral library. OPUS outputs a library search result: "Streptococcus pyogenes, Hit Quality Index 947." Your software takes that result and displays it to the clinician as "Strep A: Positive."
In this scenario, your software is not making the clinical decision. The instrument's software (OPUS) performed the classification. Your software is displaying, formatting, and transmitting a result that originated from the instrument. Your software is functioning as a clinical workflow tool - patient identification, result display, HL7 message generation, billing - but not as a diagnostic interpreter.
This is analogous to how a hospital's Electronic Health Record displays lab results. The EHR does not interpret the result. It receives a structured message from the analyzer and presents it. The EHR is not a medical device.
Under this architecture, your workflow software likely falls outside the SaMD definition. The regulatory burden is dramatically lower. You still need to comply with applicable standards (21 CFR Part 11 for electronic records if you are in a regulated environment, HIPAA for patient data), but you are not on the FDA's medical device pathway.
Scenario B: Your software provides the result
Same spectrometer. Same spectrum acquisition. But instead of using OPUS's built-in library search, the raw spectrum is exported from the instrument and fed into your software's ML pipeline. Your custom Python code runs preprocessing (baseline correction, normalization). Your trained convolutional neural network classifies the spectrum. Your software determines: "Strep A: Positive."
In this scenario, your software is making the clinical decision. The instrument provided raw data - an infrared absorption spectrum. Your software interpreted that data and produced a clinically actionable result. This is SaMD.
The distinction is architectural, not functional. The end user sees the same thing either way - a positive or negative result on a screen. But the regulatory implications are completely different.
The practical consequence
Scenario A: You can deploy in months. Your primary regulatory concerns are data privacy, electronic records compliance, and whatever the instrument vendor's requirements are for their cleared device.
Scenario B: You are looking at 12-24 months of regulatory work before you can legally market the software for clinical diagnostic use. This includes a quality management system, design controls, software lifecycle documentation, clinical evidence, cybersecurity documentation, and an FDA submission (likely De Novo - more on this below).
The cost difference between these two paths is not marginal. It is the difference between a $50K software development project and a $250K-500K regulatory program.
The SaMD classification framework
If your software does fall into SaMD territory (Scenario B), the next question is: what class of medical device is it? The classification determines the specific regulatory requirements and the type of FDA submission you need.
The IMDRF framework - which the FDA has adopted - classifies SaMD based on two dimensions:
Dimension 1: State of the healthcare situation
What is the clinical context in which the software's output is used?
| Level | Definition | Spectroscopy example |
|---|---|---|
| Critical | Situation is life-threatening or could cause irreversible harm | Sepsis pathogen identification from blood culture |
| Serious | Situation could cause significant but non-life-threatening harm | Bacterial infection identification from a throat swab |
| Non-serious | Situation is not serious; incorrect result causes minor inconvenience | Environmental monitoring, non-clinical screening |
Dimension 2: Significance of the information provided
What does the software's output do in the clinical workflow?
| Level | Definition | Spectroscopy example |
|---|---|---|
| Treat or diagnose | Software output directly drives a treatment decision or provides a diagnosis | "Patient has Strep A - prescribe antibiotics" |
| Drive clinical management | Software output informs clinical management but is not the sole basis for diagnosis | "Spectral profile suggests bacterial infection - recommend culture confirmation" |
| Inform clinical management | Software output provides supplementary information | "Sample spectral quality is adequate for analysis" |
These two dimensions create a matrix:
Inform Drive Treat/Diagnose
clinical clinical
mgmt mgmt
Non-serious │ I │ I │ II │
Serious │ I │ II │ III │
Critical │ II │ III │ III │
Most spectroscopy-based diagnostic tests - pathogen identification, tissue classification, antimicrobial susceptibility - fall into the "Serious" healthcare situation with "Treat or Diagnose" significance. That puts them at Class III under the IMDRF framework.
However, the FDA has discretion in how it applies this framework. In practice, many in vitro diagnostic (IVD) software products that would be Class III under the strict IMDRF matrix have been regulated as Class II devices through the De Novo pathway. The FDA evaluates the specific intended use, the clinical evidence, and the risk profile of each product individually.
The important thing is to know where you land on this matrix before you start building, because it determines everything that follows.
What SaMD classification triggers
Once your software is classified as SaMD, a set of regulatory requirements activate. These are not optional. They are legally required before you can market the software for clinical use in the United States.
IEC 62304: Software lifecycle processes
IEC 62304 is the international standard for medical device software lifecycle processes. The FDA recognizes it as a consensus standard, which means demonstrating conformity to IEC 62304 creates a presumption of compliance with the FDA's software requirements.
IEC 62304 defines three software safety classes:
Class A - No injury is possible. The software cannot contribute to a hazardous situation, or if it can, the hardware provides adequate protection. Minimal documentation requirements.
Class B - Non-serious injury is possible. The software can contribute to a hazardous situation that could result in non-serious injury. Moderate documentation requirements.
Class C - Death or serious injury is possible. The software can contribute to a hazardous situation that could result in death or serious injury. Full documentation requirements.
For most spectroscopy diagnostic software, you are looking at Class B or Class C. Here is what Class C (the most stringent) requires in practice:
For Class C (the most stringent), this means documented deliverables at every stage:
- Software development plan
- Requirements specification (every requirement testable and traceable)
- Architecture document and detailed design
- Unit tests, integration tests, and system tests
- Complete risk analysis per ISO 14971
Every software unit must be tested against its detailed design with documented coverage. Every identified hazard - including software-specific hazards like data corruption, incorrect calculations, and race conditions - must have documented risk controls.
This is a substantial body of work. For a team that has not done it before, IEC 62304 Class C documentation for a moderately complex software system takes 3-6 months of dedicated effort from someone who knows what they are doing.
21 CFR Part 820: Quality System Regulation
The FDA's Quality System Regulation (QSR) - recently harmonized with ISO 13485 - requires a documented quality management system (QMS) covering:
- Design controls - structured design process with inputs, outputs, reviews, verification, validation, and transfer
- Design History File (DHF) - compiling all design records
- Document control - for all quality system documents
- Corrective and Preventive Action (CAPA) process
For a startup, standing up a QMS from scratch takes 2-3 months just for the SOPs, work instructions, and templates - before you write any design control documentation for your specific product.
Cybersecurity documentation
The FDA requires pre-market cybersecurity documentation for all medical devices that connect to a network or accept data from external sources. For SaMD, this includes:
The required documentation includes:
- Software Bill of Materials (SBOM) - listing every component and its version. Your code, every third-party library, every framework. NumPy 1.26.4 for spectral preprocessing, all 47 npm packages in your React frontend - every dependency is documented.
- Threat model identifying attack surfaces and mitigations
- Security architecture documentation covering authentication, encryption, and data integrity
- Vulnerability management plan with patching timelines
The FDA has become increasingly serious about cybersecurity. Submissions without adequate cybersecurity documentation receive Refuse to Accept (RTA) letters. Do not treat this as an afterthought.
Clinical evidence
You must demonstrate that the software performs as intended in its clinical context. For spectroscopy diagnostics, this means clinical validation studies showing:
- Sensitivity and specificity against a reference method (culture, PCR, etc.)
- Performance across the intended patient population
- Performance across the intended specimen types
- Performance across the intended instrument configurations
- Robustness to expected sources of variation (operator, environment, sample preparation)
The clinical evidence requirements scale with risk classification. A Class II device through the De Novo pathway requires substantial clinical evidence but not necessarily a full clinical trial.
The gray areas
The clean Scenario A / Scenario B distinction works well for the clear cases. Real-world architectures are messier. Here are the gray areas that come up in spectroscopy diagnostics.
Preprocessing before instrument classification
Your software receives a raw spectrum from the instrument. Before passing it to the instrument's built-in classifier (OPUS library search), your software performs baseline correction and normalization - preprocessing that changes the spectral data.
Does this preprocessing make your software SaMD?
The argument for yes: your software is altering the data that the classifier operates on. If preprocessing introduces an artifact, it could change the classification result. The argument for no: preprocessing is a routine analytical step, not a clinical interpretation. The classification decision is still made by the instrument's validated classifier.
The FDA has not issued definitive guidance on this specific scenario. In practice, the answer depends on how significantly the preprocessing alters the data and whether the instrument's classifier was validated with similar preprocessing. Have this conversation with your regulatory consultant early.
Our recommendation: if you are using the instrument's built-in classifier, feed it the data in the format it expects. Do not insert custom preprocessing between the instrument and its own classifier unless there is a strong analytical reason to do so, and if you do, document the rationale and consider whether it changes your regulatory posture.
Quality control software
Your software performs automated quality control checks on spectra - verifying that the spectrum meets minimum signal-to-noise requirements, that the ATR crystal was clean, that the sample was properly seated. If a spectrum fails QC, your software blocks it from classification and prompts the operator to repeat the measurement.
Is this SaMD? Probably not. Quality control checks that gate whether a sample proceeds to classification are generally considered part of the analytical process, not clinical interpretation. But if your QC algorithm is making decisions that effectively determine the clinical outcome (e.g., classifying borderline spectra as "inadequate" when they might have been classified correctly), the line gets blurry.
Decision support vs. autonomous diagnosis
The 21st Century Cures Act exempted certain Clinical Decision Support (CDS) software from device regulation, but only if the clinician can independently review the basis for the recommendation. For a spectral classification, "independently reviewing the basis" would mean the clinician looks at the spectrum and makes their own judgment - unrealistic for clinicians who are not spectroscopists.
In practice, most spectroscopy diagnostic software will not qualify for the CDS exemption. Plan accordingly.
Multi-model architectures
Your system runs two models sequentially: a screening model (high sensitivity, moderate specificity) and a confirmation model (moderate sensitivity, high specificity). Only samples that pass the screening model are run through the confirmation model. The final clinical result is based on the confirmation model's output.
Each model is SaMD independently. The combined system is a more complex SaMD. The interaction between models - how screening results influence confirmation testing - adds regulatory complexity because you must validate the combined system performance, not just each model in isolation.
De Novo classification
Most spectroscopy-based diagnostic tests will not have a predicate device - there is no previously cleared device that does the same thing. This means the 510(k) pathway (which requires demonstrating substantial equivalence to a predicate) is not available.
The pathway is De Novo classification. De Novo is for novel, low-to-moderate risk devices that do not have a predicate. It is the FDA's mechanism for creating new device classifications.
How De Novo works
-
Pre-submission meeting. You meet with the FDA (Q-Sub process) to discuss your device and testing plans. Not required, but strongly recommended - feedback at this stage saves months of work.
-
Submission preparation. The De Novo request includes:
- Device description and intended use
- Classification rationale
- Performance testing data
- Software documentation (IEC 62304, cybersecurity)
- Labeling and proposed special controls
-
FDA review. Multiple rounds of questions and responses. Expect this to take longer than you plan.
-
Decision. If granted, the FDA creates a new device classification. Your device is authorized to market. Future devices with the same intended use can reference yours as a predicate for a faster 510(k) submission.
Timeline and cost
Realistic timelines for a spectroscopy diagnostic De Novo:
| Phase | Duration | Notes |
|---|---|---|
| Pre-submission preparation and meeting | 2-3 months | Includes Q-Sub request, FDA meeting |
| Clinical validation study | 3-6 months | Depends on sample collection, number of sites |
| Submission preparation | 3-4 months | Documentation, testing, compilation |
| FDA review | 6-12 months | Includes Q&A cycles |
| Total | 12-24 months | From decision to submit to market authorization |
Cost breakdown for a small company:
| Item | Estimated cost |
|---|---|
| Regulatory consultant | $80-150K (depends on scope and duration) |
| Clinical validation study | $50-200K (depends on number of sites and samples) |
| QMS establishment | $20-40K (consultant + internal effort) |
| FDA user fee (De Novo) | ~$70K (FY2026, small business rate available) |
| IEC 62304 documentation | $30-60K (internal effort + consultant review) |
| Cybersecurity documentation | $15-30K |
| Total | $265K-550K |
These numbers are real. They come from actual De Novo submissions for diagnostic software. If someone tells you they can get your spectroscopy diagnostic software through the FDA for $50K, they are either cutting corners or do not understand the scope.
The practical path for spectroscopy startups
Given the cost and timeline difference between Scenario A and Scenario B, here is the pragmatic approach.
Phase 1: Get to market with instrument-based classification
Use the spectrometer's built-in classification capabilities. For Bruker, this means OPUS's spectral library search. For Raman instruments, this means the vendor's built-in library matching. Your workflow software wraps the instrument's classification output in a clinician-friendly interface, generates HL7 messages, handles billing, and maintains audit trails.
Under this architecture, your workflow software is not SaMD. You can deploy in clinical settings (subject to your instrument's regulatory status, facility requirements, and laboratory-developed test (LDT) regulations). Time to market: months, not years.
This is not a compromise - the instrument's built-in classifiers are well-validated and understood by the regulatory community. OPUS library search has been used in regulated environments for decades. You are building on a proven foundation.
Phase 2: Develop your own ML model in parallel
While Phase 1 is deployed and generating revenue, develop your custom ML model. Train it. Validate it analytically. Compare its performance to the instrument's built-in classifier. Build the IEC 62304 documentation as you develop.
Phase 3: Regulatory submission for your ML model
When the model is ready and the clinical evidence supports it, submit for FDA authorization. You now have a deployed product generating revenue while you work through the regulatory process. You are not burning runway while waiting for clearance.
Phase 4: Deploy the ML model
Once authorized, update deployed sites to use your custom ML model instead of the instrument's built-in classifier. The workflow software, HL7 integration, billing, and compliance infrastructure are already deployed and proven. You are swapping out one classification engine for another.
This phased approach is not about avoiding regulation. It is about sequencing your regulatory investment so you have a deployed, revenue-generating product before you take on the cost and timeline of an FDA submission.
What this means for your architecture
The SaMD classification decision should be made before you write a line of code, because it drives architectural decisions throughout the system.
If you are SaMD (Scenario B):
- Every software component must be traceable. Requirements to design to implementation to test. Traceability matrices are mandatory under IEC 62304 and 21 CFR Part 820.
- Third-party software must be evaluated. Every library you import - NumPy, SciPy, scikit-learn, React, FastAPI - must be assessed for risk and documented in your SOUP (Software of Unknown Provenance) list. Using a library with a known unpatched vulnerability is a finding.
- Change control is formal. You cannot push a hotfix to production without a documented change request, risk assessment, regression testing, and approval. This is not bureaucratic overhead - it is legally required.
- Version control is auditable. Your Git history is part of your design history file. Squash merges, force pushes, and rebased histories create problems during audits. Maintain a clean, linear, auditable commit history.
- Your ML model is a controlled component. The model, the training data, the training process, the validation data, and the validation results are all design control deliverables. Retraining the model is a design change that requires the full change control process.
If you are not SaMD (Scenario A), you still need:
- 21 CFR Part 11 compliance (audit trails, electronic signatures)
- HIPAA compliance (patient MRNs are PHI)
- Solid engineering practices
The difference is the formality and documentation burden. SaMD adds a process layer that approximately doubles the development effort.
Building compliance in from day one
At SpectraDx, regulatory compliance is not a phase we bolt on before submission. It is built into the platform architecture from the first commit.
The audit trail is not an afterthought - it is a core database table that every write operation touches. Electronic signatures are not a feature request - they are part of the result release workflow. Access controls are not nice-to-have - they are enforced at the API layer.
For customers on the SaMD path, the SpectraDx platform provides the compliant software infrastructure - 21 CFR Part 11 audit trails, electronic signatures, operator qualification tracking, data integrity controls - so their regulatory team can focus on the clinical evidence and the FDA submission rather than building software compliance from scratch.
For customers who can stay on the non-SaMD path (using instrument-based classification), the same compliance infrastructure demonstrates to their clinical sites and laboratory accreditation bodies that the software meets the standard of care for clinical data handling. For a deeper look at the overall system architecture, see our clinical workflow architecture article.
If you are building a spectroscopy-based diagnostic and are not sure where you fall on the SaMD line, we should talk. The architectural decisions you make now determine your regulatory path for years to come. Getting them right from the start is dramatically cheaper than getting them wrong.

