Semantic Modelling · Scientific Instruments · Scientific Computing

From implicit expertise
to executable semantics

Scientific instrument facilities accumulate invisible knowledge. We make knowledge explicit by turning it into typed, traceable, and enforceable assets — inspired by the architecture of ALICE O2 at CERN.

See the architecture Our approach

Data Source

AcquisitionNode

DetectorData → context

Data Operation

CalibrationApply

validity → context

Data Probe

ReconstructionProbe

result → "output"

Context Processor

ProvenanceRecord

SER → trace anchor

IO · Data Sink

ResultArchive

path → context

data channel context channel trace anchor

The problem

Semantic debt: invisible while accruing, expensive when paid

Scientific instrument facilities produce two assets simultaneously: scientific results, which are visible and celebrated, and the knowledge required to interpret and reproduce those results, which is largely invisible.

A senior expert leaves and the conventions they maintained are no longer enforced

A regulator asks for the provenance of a result produced three years ago

An instrument is reconfigured and previously valid calibrations silently become wrong

Two facilities try to combine results and discover their conventions are incompatible

A new team inherits a system whose implicit assumptions they cannot recover

An automation programme stalls because tacit decisions cannot be formalised

The semantic model is a burden only when implicit knowledge is still affordable.

When implicit knowledge becomes the bottleneck to control, scale, and survival — explicit semantics become the recovery path.

The cost is paid not by the team that deferred the formalisation, but by the team that inherits the system. This is the structure of semantic debt.

Academic groups — CERN, large national laboratories, major synchrotron facilities — have solved this inside long-horizon, well-resourced projects. Enterprise scientific instrumentation has not.

Architecture

Executable semantics, not documentary ones

A semantic model that does not validate, execute, trace, test, or govern anything will not become authoritative. Our architecture binds semantics to execution.

⬡

Typed data products

Every value in the pipeline is wrapped in a BaseDataType subclass. Processors declare exact input and output types. The framework enforces contracts at every node boundary.

DataOperation · DataProbe

◈

Dual-channel execution

Every pipeline step processes a Payload with two coordinated flows: a data channel (typed domain objects) and a context channel (structured metadata, calibration validity, provenance).

data channel context channel

◷

Semantic Execution Records

Optional tracing layer. Every node execution emits a SER — which processor ran, context delta, timing, dependencies, assertions. Every result has a producer identity.

SER · trace anchor · run_id

◎

Contract-validated pipelines

Pipelines compile into deterministic graphs with stable semantic/configuration IDs. The SVA contract catalogue enforces structural rules. semantiva dev lint audits components before deployment.

graph_id · pipeline_id · SVA

ALICE O2 · CERN

ALICE O2 demonstrates the value of practical, distributed semantics in a demanding scientific computing environment. O2 embeds semantic meaning in workflow declarations, typed data descriptions, detector origins, input/output specifications, calibration validity, and detector-specific reconstruction formats — not in a monolithic formal ontology. The transferable lesson: put semantics where coordination already happens. Semantiva applies the same discipline as a reusable framework substrate for typed scientific workflows.

Semantiva primitives

Semantic roles mapped to execution

Semantiva makes semantic modelling actionable inside scientific workflows — from ad hoc scripts toward typed, inspectable, reproducible computation.

Primitive	Semantic role	Why it matters for instrument software
BaseDataType	Semantic noun	Names what kind of scientific object is flowing — DetectorFrame, CalibrationMap, ReconstructionResult — not anonymous bytes. Processors declare exact input/output types; mismatches are caught at configuration time.
DataOperation	Semantic verb	Defines typed transformations rather than arbitrary function calls. Apply calibration, reconstruct image, correct for geometry — each is a typed, contract-validated step with declared preconditions and postconditions.
DataProbe	Observer (read-only)	Observes the data channel without mutating it. Probe results are injected into the context channel via context_key — making quality scores, statistics, and validation flags available to downstream processors.
ContextType	Operational metadata channel	Carries calibration validity, acquisition mode, instrument state, experiment ID, and provenance alongside data. Parameters resolve from context at runtime — enabling dynamic behaviour without hardcoded configuration.
Pipeline (YAML)	Executable semantic graph	Declarative YAML compiles into a deterministic graph with a stable pipeline_id. The pipeline is the configuration artefact of record — reproducible by construction, not by luck.
SER · Trace	Reproducibility anchor	Semantic Execution Records capture which processor ran, context delta, timing, upstream dependencies, and assertion results. Every result has a traceable producer identity. Regulatory evidence is produced by design.
Contract catalogue	Semantic interface registry	SVA codes enforce structural rules across processors, nodes, and pipelines. semantiva dev lint audits components against the catalogue — shifting validation left, before deployment.
Run-Space	Parametric sweep engine	Define a family of runs — parameter sweeps, combinatorial configurations — in a single YAML spec. Each run carries its own run_id and shares a run_space_spec_id. Critical for instrument characterisation and calibration validation campaigns.

How it works

A scientific instrument pipeline, step by step

From detector acquisition through calibration, reconstruction, quality assurance, and archival — each step typed, traced, and reproducible.

Acquire — typed data enters the pipeline

A DataSource node emits DetectorFrameDataType into the data channel. Acquisition metadata (mode, exposure, geometry) is written to the context channel by a companion ContextProcessor.

processor: DetectorFrameSource
parameters:
mode: "continuous"
exposure_ms: 200

Calibrate — validity enforced by the framework

A DataOperation applies the calibration map. Calibration validity is resolved from the context channel — the framework enforces that a calibration valid for mode A cannot be silently applied in mode B.

processor: CalibrationApplyOperation
# calibration_map resolved from context
context_key: "calibration.validity"

Reconstruct & Probe — quality scores into context

A DataOperation produces the reconstruction. A DataProbe observes the result and writes quality metrics (SNR, residuals, uncertainty) into the context channel — available to downstream routing and archival decisions.

Trace — provenance by construction

A ContextProcessor emits a SER (Semantic Execution Record) capturing: which processor ran, what context keys were read/created, timing, upstream dependencies. The amber trace anchor marks every node that produces a provenance event.

# SER emitted automatically
identity.run_id: "run-7f3a91b2"
identity.pipeline_id: "pl-c4d8f2a1"
status: succeeded
wall_ms: 3.291

Archive — reproducible by construction

An IO · DataSink archives the result with its full provenance context. The output path is resolved from context via a ContextProcessor template — no hardcoded paths, no silent overwrites.

processor: template:"results/{run_id}/{mode}.h5":path
processor: ReconstructionH5Saver

When to act

Symptoms that implicit semantics has exceeded its carrying capacity

These are not hypothetical risks. They are predictable events in the lifecycle of any facility that has been operating for more than five years.

Features require too many expert reviews

Major changes touch too many subsystems and require too many senior sign-offs. No single person understands the full system.

Interfaces stable in syntax, drifting in meaning

Interfaces remain syntactically stable while their meaning changes across variants, instrument classes, or software versions.

Metadata present but not trusted

Rich metadata exists but is not trusted enough to drive automation, AI, or regulatory evidence — because no one can vouch for its provenance.

Field failures require hero debugging

Service knowledge does not feed back into R&D. Customer bugs can be reproduced only by a small number of senior people.

Regulatory evidence reconstructed manually

Quality evidence must be assembled after the fact, by hand. Verification depends on expert intuition rather than machine-readable contracts.

New entrants with lower semantic cost

A competitor with a cleaner semantic architecture can validate, simulate, configure, service, and automate faster — at lower maintenance cost.

Our approach

Three competences. One rare combination.

Our team is built on a specific and rare combination of three competences. Each exists in the market separately. We are where they converge.

Domain physics depth

Knowing which distinctions are physically real and which are incidental. A model that misses a fundamental distinction provides false confidence. One that encodes an incidental distinction over-constrains the system.

Software architecture judgment

Encoding semantic distinctions where the framework can enforce them — at transport boundaries and component interfaces — not in documentation or convention. Keeping the structure minimal enough to remain maintainable.

Knowledge engineering instinct

Capturing knowledge in a form that survives personnel changes. Recognising which tacit conventions are load-bearing, which contested questions need explicit representation, which distinctions must extend as the domain evolves.

Market position

"We turn implicit instrument knowledge into durable enterprise assets, using architectural patterns proven at the largest scientific facilities in the world."

This positions the value at the maintenance horizon, not the delivery milestone — in terms that enterprise scientific facilities can directly map to operational risk.

What we are not → What we are

A software dev shop delivering features

A knowledge infrastructure team delivering durability

A consultancy that writes recommendations

A team that builds enforceable structure

Dependent on founding individual's hours

Built on a framework that scales independently

From implicit expertise
to executable semantics

Semantic debt: invisible while accruing, expensive when paid

Executable semantics, not documentary ones

Typed data products

Dual-channel execution

Semantic Execution Records

Contract-validated pipelines

Semantic roles mapped to execution

A scientific instrument pipeline, step by step

Acquire — typed data enters the pipeline

Calibrate — validity enforced by the framework

Reconstruct & Probe — quality scores into context

Trace — provenance by construction

Archive — reproducible by construction

Symptoms that implicit semantics has exceeded its carrying capacity

Features require too many expert reviews

Interfaces stable in syntax, drifting in meaning

Metadata present but not trusted

Field failures require hero debugging

Regulatory evidence reconstructed manually

New entrants with lower semantic cost

Three competences. One rare combination.

Domain physics depth

Software architecture judgment

Knowledge engineering instinct

Apprentices, not clones

Codify patterns, not judgment

Senior practitioners leading delivery

Institutional knowledge as the growth engine

The decisive question is not whether to preserve scientific knowledge

From implicit expertiseto executable semantics

Semantic debt: invisible while accruing, expensive when paid

Executable semantics, not documentary ones

Typed data products

Dual-channel execution

Semantic Execution Records

Contract-validated pipelines

Semantic roles mapped to execution

A scientific instrument pipeline, step by step

Acquire — typed data enters the pipeline

Calibrate — validity enforced by the framework

Reconstruct & Probe — quality scores into context

Trace — provenance by construction

Archive — reproducible by construction

Symptoms that implicit semantics has exceeded its carrying capacity

Features require too many expert reviews

Interfaces stable in syntax, drifting in meaning

Metadata present but not trusted

Field failures require hero debugging

Regulatory evidence reconstructed manually

New entrants with lower semantic cost

Three competences. One rare combination.

Domain physics depth

Software architecture judgment

Knowledge engineering instinct

Apprentices, not clones

Codify patterns, not judgment

Senior practitioners leading delivery

Institutional knowledge as the growth engine

The decisive question is not whether to preserve scientific knowledge

From implicit expertise
to executable semantics