The package can register opt-in, read-only routes so a separate Laravel
admin/UI package can consume stored report artifacts. It is disabled by
default because the package bundles no authentication.
Enabling it
// config/eval-harness.php
'api' => [
'enabled' => true,
'prefix' => 'admin/eval-harness/api',
'middleware' => ['web', 'auth'],
'trend' => [
'max_files_scanned' => 5000,
],
],
The middleware stack must authenticate. This package ships no auth; mount
the API only behind your host app’s existing admin gate. The default prefix is
eval-harness/api.
The response envelope
Successful JSON responses carry a version discriminator:
{
"schema_version": "eval-harness.report-api.v1",
"data": { ... }
}
Individual endpoints add their own sub-schema discriminators (e.g.
eval-harness.report-api.v1.trend).
Most error responses are not enveloped. Missing artifacts/ids (404),
malformed report JSON on a JSON-only endpoint (422), and storage/cache failures
(503) are thrown as standard Laravel HTTP exceptions and rendered as Laravel’s
default error JSON (e.g. { "message": "..." }) — without a schema_version.
A client should branch on the HTTP status first, and only read
schema_version / data on a 2xx response.
One deliberate exception: when adversarial manifest discovery is disabled
(adversarial.manifests.disk is null), GET /adversarial/manifests and
…/{name} return a 404 whose body is enveloped —
{ "schema_version": "…", "error": "discovery_not_configured", "message": "…" }
— a structured “feature off” signal, not a generic error. Distinguish the two
by the body: a 404 carrying error === "discovery_not_configured" means
discovery is off; a 404 without that envelope (a standard Laravel
error) means the requested manifest name genuinely does not exist (discovery
is configured but the file is missing). Check the error field before deciding.
Endpoint catalog
All paths are relative to the configured prefix.
Reports
| Endpoint | Purpose |
|---|---|
GET /reports |
List JSON/Markdown report artifacts on the reports disk. |
GET /reports/{id} |
Show one artifact by URL-safe id. |
GET /reports/{id}/cohorts |
Cohort summaries for a report (JSON). |
GET /reports/{id}/histograms |
Per-metric score distribution buckets (JSON). |
GET /reports/{id}/rows.csv |
Sample-by-metric rows as CSV (sample_id, tags, metric, score, error, details). |
GET /reports/{id}/download |
Direct artifact download (.json / .md). |
GET /reports/{id}/diff/{otherId} |
Signed deltas between two reports. |
The diff endpoint computes macro_f1 delta, per-metric mean/pass-rate deltas,
per-cohort status (added / removed / regressed / improved / stable),
total_samples / total_failures deltas, and adversarial categories when
present — so a UI can show a regression side-by-side without fetching both full
reports.
Trends
| Endpoint | Purpose |
|---|---|
GET /datasets/{name}/trend?limit=N |
Chronological points for one dataset (macro-F1, per-metric mean/pass-rate, usage). |
GET /online/{dataset}/trend?limit=N |
Production pass-rate-over-time points plus the configured alert threshold. |
The dataset trend scans stored JSON reports, skips malformed artifacts, caps
limit at 100, and caps scanned files via api.trend.max_files_scanned. The
online trend feeds the online-monitoring dashboard’s
alert band.
Adversarial manifests
| Endpoint | Purpose |
|---|---|
GET /adversarial/manifests |
Enumerate adversarial run manifests on the configured disk. |
GET /adversarial/manifests/{name} |
Show one manifest + its run history. |
Opt-in via eval-harness.adversarial.manifests.{disk,path_prefix}, so a UI can
browse compliance history without scraping the filesystem. The CLI
--manifest=<path> flag remains independent.
Live batches
| Endpoint | Purpose |
|---|---|
GET /batches/live |
Active lazy-parallel batches, each with id and expires_at (no per-batch status — fetch that from the progress endpoint). |
GET /batches/{id}/progress |
Compact progress counters and the status for one batch. |
Enabled by default; disable via eval-harness.batches.live_registry.enabled.
How a UI composes them
Every endpoint is optional — a UI degrades gracefully if a given surface is
disabled. The full JSON examples and contract notes live in
docs/REPORT_API_CONTRACT.md;
the companion UI spec is in
docs/UI_PACKAGE_SPEC.md.
The companion UI package
A ready-made dashboard ships separately as
padosoft/eval-harness-admin:
Dashboard, Reports list, Report detail, Compare, Trend, Adversarial manifests, and
Live batches — all built on the contracts above, with authentication delegated to
the host app.