Semantic Modelling · Scientific Instruments · Scientific Computing

From implicit expertise
to executable semantics

Scientific instrument facilities accumulate invisible knowledge. We make knowledge explicit by turning it into typed, traceable, and enforceable assets — inspired by the architecture of ALICE O2 at CERN.

Data Source
AcquisitionNode
DetectorData → context
Data Operation
CalibrationApply
validity → context
Data Probe
ReconstructionProbe
result → "output"
Context Processor
ProvenanceRecord
SER → trace anchor
IO · Data Sink
ResultArchive
path → context
data channel context channel trace anchor

Semantic debt: invisible while accruing, expensive when paid

Scientific instrument facilities produce two assets simultaneously: scientific results, which are visible and celebrated, and the knowledge required to interpret and reproduce those results, which is largely invisible.

A senior expert leaves and the conventions they maintained are no longer enforced
A regulator asks for the provenance of a result produced three years ago
An instrument is reconfigured and previously valid calibrations silently become wrong
Two facilities try to combine results and discover their conventions are incompatible
A new team inherits a system whose implicit assumptions they cannot recover
An automation programme stalls because tacit decisions cannot be formalised
The semantic model is a burden only when implicit knowledge is still affordable.

When implicit knowledge becomes the bottleneck to control, scale, and survival — explicit semantics become the recovery path.


The cost is paid not by the team that deferred the formalisation, but by the team that inherits the system. This is the structure of semantic debt.


Academic groups — CERN, large national laboratories, major synchrotron facilities — have solved this inside long-horizon, well-resourced projects. Enterprise scientific instrumentation has not.

Executable semantics, not documentary ones

A semantic model that does not validate, execute, trace, test, or govern anything will not become authoritative. Our architecture binds semantics to execution.

Typed data products

Every value in the pipeline is wrapped in a BaseDataType subclass. Processors declare exact input and output types. The framework enforces contracts at every node boundary.

DataOperation · DataProbe

Dual-channel execution

Every pipeline step processes a Payload with two coordinated flows: a data channel (typed domain objects) and a context channel (structured metadata, calibration validity, provenance).

data channel context channel

Semantic Execution Records

Optional tracing layer. Every node execution emits a SER — which processor ran, context delta, timing, dependencies, assertions. Every result has a producer identity.

SER · trace anchor · run_id

Contract-validated pipelines

Pipelines compile into deterministic graphs with stable semantic/configuration IDs. The SVA contract catalogue enforces structural rules. semantiva dev lint audits components before deployment.

graph_id · pipeline_id · SVA
ALICE O2 · CERN

ALICE O2 demonstrates the value of practical, distributed semantics in a demanding scientific computing environment. O2 embeds semantic meaning in workflow declarations, typed data descriptions, detector origins, input/output specifications, calibration validity, and detector-specific reconstruction formats — not in a monolithic formal ontology. The transferable lesson: put semantics where coordination already happens. Semantiva applies the same discipline as a reusable framework substrate for typed scientific workflows.

Semantic roles mapped to execution

Semantiva makes semantic modelling actionable inside scientific workflows — from ad hoc scripts toward typed, inspectable, reproducible computation.

Primitive Semantic role Why it matters for instrument software
BaseDataType Semantic noun Names what kind of scientific object is flowing — DetectorFrame, CalibrationMap, ReconstructionResult — not anonymous bytes. Processors declare exact input/output types; mismatches are caught at configuration time.
DataOperation Semantic verb Defines typed transformations rather than arbitrary function calls. Apply calibration, reconstruct image, correct for geometry — each is a typed, contract-validated step with declared preconditions and postconditions.
DataProbe Observer (read-only) Observes the data channel without mutating it. Probe results are injected into the context channel via context_key — making quality scores, statistics, and validation flags available to downstream processors.
ContextType Operational metadata channel Carries calibration validity, acquisition mode, instrument state, experiment ID, and provenance alongside data. Parameters resolve from context at runtime — enabling dynamic behaviour without hardcoded configuration.
Pipeline (YAML) Executable semantic graph Declarative YAML compiles into a deterministic graph with a stable pipeline_id. The pipeline is the configuration artefact of record — reproducible by construction, not by luck.
SER · Trace Reproducibility anchor Semantic Execution Records capture which processor ran, context delta, timing, upstream dependencies, and assertion results. Every result has a traceable producer identity. Regulatory evidence is produced by design.
Contract catalogue Semantic interface registry SVA codes enforce structural rules across processors, nodes, and pipelines. semantiva dev lint audits components against the catalogue — shifting validation left, before deployment.
Run-Space Parametric sweep engine Define a family of runs — parameter sweeps, combinatorial configurations — in a single YAML spec. Each run carries its own run_id and shares a run_space_spec_id. Critical for instrument characterisation and calibration validation campaigns.

A scientific instrument pipeline, step by step

From detector acquisition through calibration, reconstruction, quality assurance, and archival — each step typed, traced, and reproducible.

1

Acquire — typed data enters the pipeline

A DataSource node emits DetectorFrameDataType into the data channel. Acquisition metadata (mode, exposure, geometry) is written to the context channel by a companion ContextProcessor.

processor: DetectorFrameSource
parameters:
  mode: "continuous"
  exposure_ms: 200
2

Calibrate — validity enforced by the framework

A DataOperation applies the calibration map. Calibration validity is resolved from the context channel — the framework enforces that a calibration valid for mode A cannot be silently applied in mode B.

processor: CalibrationApplyOperation
# calibration_map resolved from context
context_key: "calibration.validity"
3

Reconstruct & Probe — quality scores into context

A DataOperation produces the reconstruction. A DataProbe observes the result and writes quality metrics (SNR, residuals, uncertainty) into the context channel — available to downstream routing and archival decisions.

4

Trace — provenance by construction

A ContextProcessor emits a SER (Semantic Execution Record) capturing: which processor ran, what context keys were read/created, timing, upstream dependencies. The amber trace anchor marks every node that produces a provenance event.

# SER emitted automatically
identity.run_id: "run-7f3a91b2"
identity.pipeline_id: "pl-c4d8f2a1"
status: succeeded
wall_ms: 3.291
5

Archive — reproducible by construction

An IO · DataSink archives the result with its full provenance context. The output path is resolved from context via a ContextProcessor template — no hardcoded paths, no silent overwrites.

processor: template:"results/{run_id}/{mode}.h5":path
processor: ReconstructionH5Saver

Symptoms that implicit semantics has exceeded its carrying capacity

These are not hypothetical risks. They are predictable events in the lifecycle of any facility that has been operating for more than five years.

Features require too many expert reviews

Major changes touch too many subsystems and require too many senior sign-offs. No single person understands the full system.

Interfaces stable in syntax, drifting in meaning

Interfaces remain syntactically stable while their meaning changes across variants, instrument classes, or software versions.

Metadata present but not trusted

Rich metadata exists but is not trusted enough to drive automation, AI, or regulatory evidence — because no one can vouch for its provenance.

Field failures require hero debugging

Service knowledge does not feed back into R&D. Customer bugs can be reproduced only by a small number of senior people.

Regulatory evidence reconstructed manually

Quality evidence must be assembled after the fact, by hand. Verification depends on expert intuition rather than machine-readable contracts.

New entrants with lower semantic cost

A competitor with a cleaner semantic architecture can validate, simulate, configure, service, and automate faster — at lower maintenance cost.

Three competences. One rare combination.

Our team is built on a specific and rare combination of three competences. Each exists in the market separately. We are where they converge.

Domain physics depth

Knowing which distinctions are physically real and which are incidental. A model that misses a fundamental distinction provides false confidence. One that encodes an incidental distinction over-constrains the system.

Software architecture judgment

Encoding semantic distinctions where the framework can enforce them — at transport boundaries and component interfaces — not in documentation or convention. Keeping the structure minimal enough to remain maintainable.

Knowledge engineering instinct

Capturing knowledge in a form that survives personnel changes. Recognising which tacit conventions are load-bearing, which contested questions need explicit representation, which distinctions must extend as the domain evolves.

Market position

"We turn implicit instrument knowledge into durable enterprise assets, using architectural patterns proven at the largest scientific facilities in the world."

This positions the value at the maintenance horizon, not the delivery milestone — in terms that enterprise scientific facilities can directly map to operational risk.

What we are not → What we are
A software dev shop delivering features
A knowledge infrastructure team delivering durability
A consultancy that writes recommendations
A team that builds enforceable structure
Dependent on founding individual's hours
Built on a framework that scales independently

The decisive question is not whether to preserve scientific knowledge

It is whether to preserve it by design or by accident. Preservation by design means building the knowledge into the structure of the system: typed, conditional, versioned, traceable, and enforceable by the framework itself.