Scientific instrument facilities accumulate invisible knowledge. We make knowledge explicit by turning it into typed, traceable, and enforceable assets — inspired by the architecture of ALICE O2 at CERN.
Scientific instrument facilities produce two assets simultaneously: scientific results, which are visible and celebrated, and the knowledge required to interpret and reproduce those results, which is largely invisible.
When implicit knowledge becomes the bottleneck to control, scale, and survival — explicit semantics become the recovery path.
The cost is paid not by the team that deferred the formalisation, but by the team that inherits the system. This is the structure of semantic debt.
Academic groups — CERN, large national laboratories, major synchrotron facilities — have solved this inside long-horizon, well-resourced projects. Enterprise scientific instrumentation has not.
A semantic model that does not validate, execute, trace, test, or govern anything will not become authoritative. Our architecture binds semantics to execution.
Every value in the pipeline is wrapped in a BaseDataType subclass. Processors declare exact input and output types. The framework enforces contracts at every node boundary.
Every pipeline step processes a Payload with two coordinated flows: a data channel (typed domain objects) and a context channel (structured metadata, calibration validity, provenance).
Optional tracing layer. Every node execution emits a SER — which processor ran, context delta, timing, dependencies, assertions. Every result has a producer identity.
Pipelines compile into deterministic graphs with stable semantic/configuration IDs. The SVA contract catalogue enforces structural rules. semantiva dev lint audits components before deployment.
ALICE O2 demonstrates the value of practical, distributed semantics in a demanding scientific computing environment. O2 embeds semantic meaning in workflow declarations, typed data descriptions, detector origins, input/output specifications, calibration validity, and detector-specific reconstruction formats — not in a monolithic formal ontology. The transferable lesson: put semantics where coordination already happens. Semantiva applies the same discipline as a reusable framework substrate for typed scientific workflows.
Semantiva makes semantic modelling actionable inside scientific workflows — from ad hoc scripts toward typed, inspectable, reproducible computation.
| Primitive | Semantic role | Why it matters for instrument software |
|---|---|---|
| BaseDataType | Semantic noun | Names what kind of scientific object is flowing — DetectorFrame, CalibrationMap, ReconstructionResult — not anonymous bytes. Processors declare exact input/output types; mismatches are caught at configuration time. |
| DataOperation | Semantic verb | Defines typed transformations rather than arbitrary function calls. Apply calibration, reconstruct image, correct for geometry — each is a typed, contract-validated step with declared preconditions and postconditions. |
| DataProbe | Observer (read-only) | Observes the data channel without mutating it. Probe results are injected into the context channel via context_key — making quality scores, statistics, and validation flags available to downstream processors. |
| ContextType | Operational metadata channel | Carries calibration validity, acquisition mode, instrument state, experiment ID, and provenance alongside data. Parameters resolve from context at runtime — enabling dynamic behaviour without hardcoded configuration. |
| Pipeline (YAML) | Executable semantic graph | Declarative YAML compiles into a deterministic graph with a stable pipeline_id. The pipeline is the configuration artefact of record — reproducible by construction, not by luck. |
| SER · Trace | Reproducibility anchor | Semantic Execution Records capture which processor ran, context delta, timing, upstream dependencies, and assertion results. Every result has a traceable producer identity. Regulatory evidence is produced by design. |
| Contract catalogue | Semantic interface registry | SVA codes enforce structural rules across processors, nodes, and pipelines. semantiva dev lint audits components against the catalogue — shifting validation left, before deployment. |
| Run-Space | Parametric sweep engine | Define a family of runs — parameter sweeps, combinatorial configurations — in a single YAML spec. Each run carries its own run_id and shares a run_space_spec_id. Critical for instrument characterisation and calibration validation campaigns. |
From detector acquisition through calibration, reconstruction, quality assurance, and archival — each step typed, traced, and reproducible.
A DataSource node emits DetectorFrameDataType into the data channel. Acquisition metadata (mode, exposure, geometry) is written to the context channel by a companion ContextProcessor.
A DataOperation applies the calibration map. Calibration validity is resolved from the context channel — the framework enforces that a calibration valid for mode A cannot be silently applied in mode B.
A DataOperation produces the reconstruction. A DataProbe observes the result and writes quality metrics (SNR, residuals, uncertainty) into the context channel — available to downstream routing and archival decisions.
A ContextProcessor emits a SER (Semantic Execution Record) capturing: which processor ran, what context keys were read/created, timing, upstream dependencies. The amber trace anchor marks every node that produces a provenance event.
An IO · DataSink archives the result with its full provenance context. The output path is resolved from context via a ContextProcessor template — no hardcoded paths, no silent overwrites.
These are not hypothetical risks. They are predictable events in the lifecycle of any facility that has been operating for more than five years.
Major changes touch too many subsystems and require too many senior sign-offs. No single person understands the full system.
Interfaces remain syntactically stable while their meaning changes across variants, instrument classes, or software versions.
Rich metadata exists but is not trusted enough to drive automation, AI, or regulatory evidence — because no one can vouch for its provenance.
Service knowledge does not feed back into R&D. Customer bugs can be reproduced only by a small number of senior people.
Quality evidence must be assembled after the fact, by hand. Verification depends on expert intuition rather than machine-readable contracts.
A competitor with a cleaner semantic architecture can validate, simulate, configure, service, and automate faster — at lower maintenance cost.
Our team is built on a specific and rare combination of three competences. Each exists in the market separately. We are where they converge.
Knowing which distinctions are physically real and which are incidental. A model that misses a fundamental distinction provides false confidence. One that encodes an incidental distinction over-constrains the system.
Encoding semantic distinctions where the framework can enforce them — at transport boundaries and component interfaces — not in documentation or convention. Keeping the structure minimal enough to remain maintainable.
Capturing knowledge in a form that survives personnel changes. Recognising which tacit conventions are load-bearing, which contested questions need explicit representation, which distinctions must extend as the domain evolves.
"We turn implicit instrument knowledge into durable enterprise assets, using architectural patterns proven at the largest scientific facilities in the world."
This positions the value at the maintenance horizon, not the delivery milestone — in terms that enterprise scientific facilities can directly map to operational risk.
It is whether to preserve it by design or by accident. Preservation by design means building the knowledge into the structure of the system: typed, conditional, versioned, traceable, and enforceable by the framework itself.