Murk Architecture
This document explains Murk’s architecture for developers who want to understand how the engine works internally. For a practical introduction to building simulations, see CONCEPTS.md.
Table of Contents
- Design Goals
- Crate Structure
- Three-Interface Model
- Arena-Based Generational Allocation
- Runtime Modes
- Threading Model
- Spatial Model
- Field Model
- Propagator Pipeline
- Observation Pipeline
- Command Model
- Error Handling and Recovery
- Determinism
- Language Bindings
Design Goals
Murk is a world simulation engine for reinforcement learning and real-time applications. The architecture optimises for:
- Deterministic replay — identical inputs produce identical outputs across runs on the same platform.
- Zero-GC memory management — arena allocation with predictable lifetimes, no garbage collection pauses.
- ML-native observation extraction — pre-compiled observation plans that produce fixed-shape tensors directly, not intermediate representations.
- Two runtime modes from one codebase — synchronous lockstep for training, asynchronous real-time for live interaction.
Three principles guide every subsystem:
- Egress Always Returns — observation extraction always returns, even during tick failures or shutdown. Responses may indicate staleness or degraded coverage via metadata, but the caller always receives data.
- Tick-Expressible Time — the engine expresses all internal time references that affect state transitions in tick counts, never wall clocks. This prevents replay divergence.
- Asymmetric Mode Dampening — the engine handles staleness and overload differently in each runtime mode, because Lockstep and RealtimeAsync have fundamentally different dynamics.
Crate Structure
murk/
├── murk Top-level facade (add this one dependency)
├── murk-core Leaf crate: IDs, field defs, commands, core traits
├── murk-arena Arena-based generational allocation
├── murk-space Spatial backends and region planning
├── murk-propagator Propagator trait, pipeline validation, StepContext
├── murk-propagators Reference propagators (diffusion, movement, reward)
├── murk-obs Observation spec, compilation, tensor extraction
├── murk-engine Simulation engine: LockstepWorld, RealtimeAsyncWorld
├── murk-replay Deterministic replay recording and verification
├── murk-ffi C ABI bindings with handle tables
├── murk-python Python/PyO3 bindings with Gymnasium adapters
├── murk-bench Benchmark profiles and utilities
└── murk-test-utils Shared test fixtures
Dependency flow (arrows point from dependee to dependent):
murk-core ──┬── murk-arena ──┬── murk-engine ──┬── murk-ffi
├── murk-space ──┤ └── murk-python
├── murk-propagator ─┤
└── murk-obs ────────┘
murk-replay ─────┘
Safety boundary: only murk-arena and murk-ffi are permitted
unsafe code. Every other crate uses #![forbid(unsafe_code)].
Three-Interface Model
All interaction with a Murk world flows through three interfaces:
[Producers] [Consumers]
| ^
v |
Ingress ──(bounded queue)──> TickEngine ──(publish)──> Egress
\ |
└──(ring buffer)──────┘
- Ingress accepts commands (intents to change world state). It implements backpressure via a bounded queue, TTL-based expiry, and deterministic drop policies.
- TickEngine is the sole authoritative mutator. It drains the ingress queue, executes the propagator pipeline, and publishes an immutable snapshot at each tick boundary.
- Egress reads published snapshots to produce observations. It never mutates world state. In RealtimeAsync mode, egress workers run on a thread pool for concurrent observation extraction.
This separation enforces the key invariant: only TickEngine holds
&mut WorldState. Everything else operates on immutable snapshots.
Arena-Based Generational Allocation
This is Murk’s most load-bearing design decision. It replaces traditional copy-on-write with a generational arena scheme:
- Each field is stored as a contiguous
[f32]allocation in a generational arena. - At tick start, propagators write to fresh allocations in the new generation — no copies required.
- Unmodified fields share their allocation across generations (zero-cost structural sharing).
- Snapshot publication swaps a ~1KB descriptor of field handles. Cost: <2us.
- Old generations remain readable until all snapshot references are released.
| Property | Traditional CoW | Arena-Generational |
|---|---|---|
| Copy cost | Fault-driven, unpredictable | Zero (allocate fresh) |
| Snapshot publish | Clone or fork | Descriptor swap, <2us |
| Rollback | Undo log or checkpoint | Free (abandon generation) |
| Memory predictability | Fault-driven | Bump allocation |
Rust type-level enforcement
ReadArena(published snapshots):Send + Sync, safe for concurrent reads.WriteArena(staging, exclusive to TickEngine):&mutaccess, no aliasing possible.- Snapshot descriptors contain
FieldHandlevalues (generation-scoped integers), not raw pointers.ReadArena::resolve(handle)provides&[f32]access. - Field access requires
&FieldArena— the borrow checker enforces arena liveness.
Lockstep arena recycling
In Lockstep mode, two arena buffers alternate roles each tick
(ping-pong). The caller’s &mut self borrow on step_sync() guarantees
no outstanding snapshot borrows. Memory usage is bounded at 2x the
per-generation field footprint regardless of episode length.
RealtimeAsync reclamation
In RealtimeAsync mode, epoch-based reclamation manages arena lifetimes. Each egress worker pins an epoch while reading a snapshot. The TickEngine reclaims old generations only when no worker holds a reference. Stalled workers are detected and torn down to prevent unbounded memory growth.
Runtime Modes
Murk provides two runtime modes from the same codebase. There is no runtime mode-switching — you choose at construction time.
LockstepWorld
A callable struct with &mut self methods. The caller’s thread executes
the full pipeline: command processing, propagators, snapshot publication,
and observation extraction.
#![allow(unused)]
fn main() {
let mut world = LockstepWorld::new(config)?;
let result = world.step_sync(commands)?;
let heat = result.snapshot.read(FieldId(0)).unwrap();
}
- Synchronous, deterministic, throughput-maximised.
- The borrow checker enforces that snapshots are released before the next step.
- No background threads, no synchronisation overhead.
- Primary use case: RL training loops, deterministic replay.
RealtimeAsyncWorld
An autonomous tick thread running at a configurable rate (e.g., 60 Hz).
#![allow(unused)]
fn main() {
let async_config = AsyncConfig::default();
let mut world = RealtimeAsyncWorld::new(config, async_config)?;
world.submit_commands(commands)?;
let snapshot = world.latest_snapshot();
let report = world.shutdown();
}
- Non-blocking command submission and observation extraction.
- Egress thread pool for concurrent ObsPlan execution.
- Epoch-based memory reclamation.
- Primary use case: live games, interactive tools, dashboards.
BatchedEngine
BatchedEngine owns a Vec<LockstepWorld> and an optional ObsPlan.
Its hot path, step_and_observe(), steps all worlds sequentially then
calls ObsPlan::execute_batch() to fill a contiguous output buffer
across all worlds.
Error model: BatchError annotates failures with the world index:
Step { world_index, error }— a world’sstep_sync()failedObserve(ObsError)— observation extraction failedConfig(ConfigError)— world creation or reset failedInvalidIndex { world_index, num_worlds }— index out of boundsNoObsPlan— observation requested withoutObsSpecInvalidArgument { reason }— argument validation failed
FFI layer: BATCHED: Mutex<HandleTable<BatchedEngine>> stores
engine instances. Nine extern "C" functions expose create, step,
observe, reset, destroy, and dimension queries.
PyO3 layer: BatchedWorld caches dimensions at construction time,
validates buffer shapes eagerly, and releases the GIL via py.detach()
on all hot paths. The Ungil boundary requires casting raw pointers to
usize before entering the detached closure.
Threading Model
Lockstep
No dedicated threads. The caller’s thread runs the full tick pipeline. Thread count equals the number of vectorised environments (typically 16-128 for RL training).
RealtimeAsync
| Thread(s) | Role | Owns |
|---|---|---|
| TickEngine (1) | Tick loop: drain ingress, run propagators, publish | &mut WorldState, WriteArena |
| Egress pool (N) | Execute ObsPlans against snapshots | &ReadArena (shared) |
| Ingress acceptor (0-M) | Accept commands, assign arrival_seq | Write end of bounded queue |
Snapshot lifetime is managed by epoch-based reclamation, not reference counting. This avoids cache-line ping-pong from atomic refcount updates under high observation throughput.
Spatial Model
Spaces define how many cells exist and which cells are neighbours.
All spaces implement the Space trait, which provides:
cell_count()— total cellsneighbours(cell)— ordered neighbour listdistance(a, b)— scalar distance metric- Region planning for observation extraction
Built-in backends
| Space | Dims | Neighbours | Edge handling |
|---|---|---|---|
Line1D | 1D | 2 | Absorb, Wrap |
Ring1D | 1D | 2 (periodic) | Always wraps |
Square4 | 2D | 4 (N/S/E/W) | Absorb, Wrap |
Square8 | 2D | 8 (+ diagonals) | Absorb, Wrap |
Hex2D | 2D | 6 | Absorb, Wrap |
FCC12 | 3D | 12 (face-centred cubic) | Absorb, Wrap |
ProductSpace
Spaces can be composed via ProductSpace to create higher-dimensional
topologies. For example, Hex2D x Line1D creates a layered hex map
where each layer is a hex grid and vertical neighbours are connected
via the Line1D component.
#![allow(unused)]
fn main() {
let space = ProductSpace::new(vec![
Box::new(Hex2D::new(8, EdgeBehavior::Wrap)?),
Box::new(Line1D::new(3, EdgeBehavior::Absorb)?),
]);
}
Coordinates are concatenated across components. Neighbours vary one component at a time (no diagonal cross-component adjacency).
Field Model
The field model defines how per-cell simulation data is typed, allocated, and bounded.
Fields are per-cell data stored in arenas. Each field has:
- Type:
Scalar(1 float),Vector(n)(n floats), orCategorical(n)(n classes). - Mutability class: controls arena allocation strategy.
- Boundary behaviour:
Clamp,Reflect,Absorb, orWrap. - Optional units and bounds metadata.
Mutability classes
| Class | Arena behaviour | Use case |
|---|---|---|
Static | Allocated once in generation 0, shared across all snapshots | Terrain, obstacles |
PerTick | Fresh allocation each tick | Temperature, velocity |
Sparse | New allocation only when modified | Rare events, flags |
For vectorised RL (128 envs x 2MB mutable + 8MB shared static): 264MB total vs 1.28GB without Static field sharing.
Propagator Pipeline
Propagators are stateless operators that update fields each tick.
They implement the Propagator trait:
#![allow(unused)]
fn main() {
pub trait Propagator: Send + Sync {
fn name(&self) -> &str;
fn reads(&self) -> FieldSet; // current-tick values (Euler)
fn reads_previous(&self) -> FieldSet; // frozen tick-start values (Jacobi)
fn writes(&self) -> Vec<(FieldId, WriteMode)>;
fn max_dt(&self, space: &dyn Space) -> Option<f64>; // topology-aware CFL constraint
fn step(&self, ctx: &mut StepContext<'_>) -> Result<(), PropagatorError>;
}
}
Key properties:
&selfsignature — propagators are stateless. All mutable state flows throughStepContext.- Split-borrow reads —
reads()sees current in-tick values (Euler style),reads_previous()sees frozen tick-start values (Jacobi style). This supports both integration approaches. - Write-conflict detection — the pipeline validates at startup that no two propagators write the same field in conflicting modes.
- CFL validation — if a propagator declares
max_dt(space), the engine checksdt <= max_dtat configuration time for the configured topology. - Deterministic execution order — propagators run in the order they are registered. The pipeline is a strict ordered list.
Observation Pipeline
The observation pipeline transforms world state into fixed-shape tensors for RL frameworks:
ObsSpec ──(compile)──> ObsPlan ──(execute against snapshot)──> f32 tensor
- ObsSpec declares what to observe: which fields, which spatial region, what transforms (normalisation, pooling, foveation).
- ObsPlan is a compiled, bound, executable plan. It pre-resolves field offsets, region iterators, index mappings, and pooling kernels. Compilation is done once; execution is the hot path.
- Execution fills a caller-allocated buffer with
f32values and a validity mask for non-rectangular domains (e.g., hex grids).
ObsPlans are bound to a world configuration generation. If the world configuration changes (fields added, space resized), plans are invalidated and must be recompiled.
Command Model
Commands are the way external actions enter the simulation. Each command carries:
- Payload:
SetField,Move,Spawn,Despawn,SetParameter,SetParameterBatch, orCustom. - TTL:
expires_after_tick— tick-based expiry (never wall clock). - Priority class: determines application order within a tick.
- Ordering provenance:
source_id,source_seq, and engine-assignedarrival_seqfor deterministic ordering.
The TickEngine drains and applies commands in deterministic order:
- Resolve
apply_tick_idfor each command. - Group by tick.
- Sort within tick by priority class, then
source_id, thensource_seq, thenarrival_seq.
Every command produces a Receipt reporting whether it was accepted,
which tick it was applied at, and a reason code if rejected.
Error Handling and Recovery
Tick atomicity
Tick execution is all-or-nothing. If any propagator fails, all staging writes are abandoned (free with the arena model — just drop the staging generation). The world state remains exactly as it was before the tick.
Recovery behaviour
- Lockstep:
step_sync()returnsErr(StepError). The caller decides how to recover (typicallyreset()). - RealtimeAsync: after 3 consecutive rollbacks, the TickEngine
disables ticking and rejects further commands. Egress continues
serving the last good snapshot (Egress Always Returns). Recovery
via
reset().
See error-reference.md for the complete error type catalogue.
Determinism
Murk targets Tier B determinism: identical results within the same build, ISA, and toolchain, given the same initial state, seed, and command log.
Determinism Contract
Determinism holds when all of these match between runs:
- Build profile (debug/release) and optimization level
- Compiler version (rustc, PyO3/maturin)
- CPU ISA family (e.g., x86-64, aarch64)
- Cargo feature flags and dependency versions
Determinism is not promised across:
- Different ISAs (x86-64 vs aarch64)
- Different
libmimplementations (glibc vs musl vs macOS) - Builds with fast-math or non-default RUSTFLAGS
- Different Murk versions (even patch releases may change propagator numerics)
Authoritative vs Non-Authoritative Paths
The authoritative path must be deterministic — any change here requires determinism test verification:
TickEngine: propagator execution, command application, generation staging- Propagator
step()implementations and pipeline ordering IngressQueue: command sorting and expiry- Snapshot publish (generation swap)
- Arena allocation and recycling patterns
The non-authoritative path may vary between runs and must never affect world state:
- Rendering, logging, and metrics collection
- Wall-clock pacing and backpressure in RealtimeAsync mode
- Egress worker scheduling and observation extraction timing
StepMetricstiming measurements- CLI tooling and debug output
Contributors: if your change touches the authoritative path, run the full
determinism test suite (cargo test --test determinism) and verify
snapshot hashes are unchanged.
Key Mechanisms
- No
HashMap/HashSet— banned project-wide via clippy. All code usesIndexMap/BTreeMapfor deterministic iteration. - No fast-math — floating-point reassociation is prohibited in authoritative code paths.
- Tick-based time — all state-affecting time references use tick counts, not wall clocks.
- Deterministic command ordering — commands are sorted by priority class and source ordering, not arrival time.
- Replay support — binary replay format records initial state, seed, and command log with per-tick snapshot hashes for divergence detection.
Known Footguns
Floating-point transcendentals: Even without fast-math, sin, cos,
exp, and log can vary across platforms and libm implementations.
Propagators using transcendentals in authoritative updates remain Tier B
(same ISA/toolchain), but this is the most likely source of cross-platform
divergence. If tighter guarantees are needed in future, a murk_math
shim can provide consistent implementations.
Parallelism introduction: The current architecture is safe because propagators execute sequentially and the batched engine steps worlds in order. When parallel propagators or Rayon-based batched stepping are introduced, determinism tests must become thread-count invariant: run with 1, 2, and 8 threads, permute world ordering, and require identical snapshot hashes.
See determinism-catalogue.md for the full catalogue of non-determinism sources and mitigations.
Language Bindings
C FFI (murk-ffi)
Stable, handle-based C ABI:
- Opaque handles (
MurkWorld,MurkSnapshot,MurkObsPlan) with slot+generation for safe double-destroy. - Caller-allocated buffers for tensor output (no allocation on the hot path).
- Versioned API with explicit error codes (current ABI: v3.0).
- Panic-safe FFI boundary: all
extern "C"entry points are guarded; panics returnMurkStatus::Panicked(-128) instead of unwinding. - Panic diagnostics are retrievable via
murk_last_panic_message. MurkStepMetricsincludes sparse observability counters: retired ranges, pending retired ranges, reuse hits, and reuse misses.
Python (murk-python)
PyO3/maturin native extension:
MurkEnv— single-environment GymnasiumEnvadapter.MurkVecEnv— vectorised environment adapter for parallel RL training.BatchedWorld— batched PyO3 wrapper: steps N worlds and extracts observations in a singlepy.detach()call. Pointer addresses are cast tousizefor theUngilclosure boundary.BatchedVecEnv— pure-Python SB3-compatible vectorized environment with pre-allocated NumPy buffers, auto-reset, and override hooks for reward/termination logic.- Direct NumPy array filling via the C FFI path.
- Python-defined propagators for prototyping.
- FFI panic status (
-128) maps to PythonRuntimeErrorwith the captured panic message.