Batch execution : eval-harness

Operations
Batch execution

A run dispatches samples through a batch. The default is deterministic and
in-process; the alternative fans samples out across Laravel queue workers. This
page covers both and the knobs that shape throughput.

Serial vs lazy-parallel

flowchart TB subgraph Serial["SerialBatch (default)"] S1[sample 1] --> S2[sample 2] --> S3[sample 3] --> SR[scored in order] end subgraph Lazy["LazyParallelBatch (--batch=lazy-parallel)"] P[Producer] -->|dispatch jobs| Q[(Queue / Horizon)] Q --> W1[worker] & W2[worker] & W3[worker] W1 & W2 & W3 -->|write output by index| RS[(BatchResultStore<br/>cache-backed)] RS -->|reassemble in dataset order| PR[scored in order] end

	SerialBatch	LazyParallelBatch
flag	default	`--batch=lazy-parallel`
execution	in-process, one at a time	queue jobs across workers
determinism	fully deterministic	order-preserving via indexed result store
SUT requirement	any callable	container-resolvable `SampleRunner`
best for	small datasets, tests, the PR gate	large datasets, judge-heavy runs, nightly

Both produce an identical report — lazy-parallel reassembles outputs in
dataset order from the result store even though jobs finish out of order.

SUT serializability

Queued jobs carry only the runner class name, so the worker re-resolves a
fresh runner from its container. That constrains what can run in parallel:

Works in lazy-parallel

A concrete SampleRunner class bound into the container, including one with
constructor-injected object dependencies the worker container can resolve to
an equivalent fresh instance.

Serial-only

Arbitrary callables, closures, anonymous runners, runners with
optional/defaulted constructor state, scalar/array/null runner properties, and
any caller-specific object configuration — because none of it survives being
reduced to a class name.

// queue-ready
$this->app->bind('eval-harness.sut', \App\Eval\MyRagRunner::class);

Operational profiles

--batch-profile=ci|smoke|nightly applies a named preset of batch defaults.
Explicit CLI flags always win, so profiles never lock you in.

profile	shape
`ci`	lazy-parallel with sane CI defaults (modest concurrency, job timeout, periodic checkpoints)
`smoke`	serial, fast — a quick pre-merge sanity check
`nightly`	higher concurrency with throttled dispatch and checkpoints for long runs

php artisan eval-harness:run rag.factuality.fy2026 \
  --batch-profile=ci --queue=evals --json --out=evals/ci-rag.json

Host apps override or add profiles under eval-harness.batches.profiles.*.

Producer-side backpressure

These flags bound how fast the producer dispatches, independent of profile:

flag	effect
`--concurrency=N`	fan-out cap and default dispatch-window size.
`--chunk-size=N`	narrower dispatch window (`<= --concurrency`); the producer waits for each chunk before dispatching the next, so this becomes the effective in-flight limit per producer.
`--rate-limit=N --rate-window-seconds=W`	throttle to N samples per W-second sliding window (monotonic-clock, amortized O(1)).
`--result-ttl-seconds=N`	how long result metadata/outputs stay alive for delayed collection.
`--timeout=N`	per-sample job timeout.
`--batch-timeout=N`	caps the producer’s wait on each dispatch window (dispatch + collection).

Pass none (or null) on any nullable numeric flag to clear an inherited
profile value for a one-off run — e.g. --rate-limit=none disables the
nightly profile’s throttle for a single invocation. --queue is excluded;
override it in config instead.

Progress checkpoints and terminal status

--checkpoint-every=N emits structured progress events every N completed samples
through an optional BatchProgressReporter container binding (default
NullBatchProgressReporter). Bind your own to forward progress to logs or a
dashboard.

Dashboards that must distinguish a finished-failed batch from a stalled
one can implement the optional BatchTerminalProgressReporter sub-contract — it
adds a reportTerminal(...) callback with explicit success / failure /
empty status and partial-wins tolerance on the failure path.

flowchart LR B[LazyParallelBatch] -->|every N done| PR["BatchProgressReporter::reportCheckpoint()"] B -->|on completion| TR["BatchTerminalProgressReporter::reportTerminal()<br/>success / failure / empty"]

Live batch registry

When the live registry is enabled (default), two read-only endpoints expose
active batches and their progress counters for a UI:

GET /{prefix}/batches/live
GET /{prefix}/batches/{id}/progress

Disable with eval-harness.batches.live_registry.enabled = false. See
Report API.

Horizon & queues

Sizing workers and the shared cache store for production.

Open →

Running evaluations

Binding the SUT and the command flags.

Open →

Last updated: Edit this page

Serial vs lazy-parallel

SUT serializability

Operational profiles

Producer-side backpressure

Progress checkpoints and terminal status

Live batch registry

onThisPage