History¶
Every Scrutin invocation records its outcome to an embedded SQLite database at .scrutin/state.db (created and managed by Scrutin, gitignored by default). The same file also holds the source-to-test dependency map and the file-hash fingerprints that drive watch mode, so a single delete gives you a clean slate.
No external binary is required: Scrutin links SQLite in via rusqlite (with the bundled feature), so the DB always works regardless of what's installed on the host.
Reruns and flaky tests¶
Set run.reruns to re-execute failing files:
A file that fails and then passes on rerun is marked flaky. Flaky results are persisted in the results table (via the retries column, plus the final outcome), surfaced in the plain-mode summary, tagged scrutin.flaky="true" in JUnit XML, and queryable via scrutin stats (whose flaky-test query is embedded below under Typical queries).
Run metadata¶
Scrutin records provenance for every run: version, OS, hostname, git SHA, branch, dirty state, CI provider, and CI build identifiers. These values populate the runs table (see Schema) and also land in the JUnit <properties> block.
Add custom labels with --set extras.key=value on the command line, or an [extras] section in .scrutin/config.toml. Disable provenance capture entirely with [metadata] enabled = false.
Schema¶
The DDL below is the authoritative schema, loaded verbatim at startup via CREATE TABLE IF NOT EXISTS. The same file lives at crates/scrutin-core/src/storage/sql/schema.sql and is embedded into the binary with include_str!.
CREATE TABLE IF NOT EXISTS runs (
run_id TEXT PRIMARY KEY,
timestamp TEXT NOT NULL,
hostname TEXT,
ci TEXT,
scrutin_version TEXT NOT NULL,
git_commit TEXT,
git_branch TEXT,
git_dirty INTEGER,
repo_name TEXT,
repo_url TEXT,
repo_root TEXT,
build_number TEXT,
build_id TEXT,
build_name TEXT,
build_url TEXT,
os_platform TEXT,
os_release TEXT,
os_version TEXT,
os_arch TEXT
);
CREATE TABLE IF NOT EXISTS results (
run_id TEXT NOT NULL,
run_seq INTEGER NOT NULL,
file TEXT NOT NULL,
tool TEXT NOT NULL,
tool_version TEXT,
app_name TEXT,
app_version TEXT,
subject_kind TEXT NOT NULL,
subject_name TEXT NOT NULL,
subject_parent TEXT,
outcome TEXT NOT NULL,
duration_ms INTEGER NOT NULL DEFAULT 0,
retries INTEGER NOT NULL DEFAULT 0,
total INTEGER,
failed INTEGER,
fraction REAL,
FOREIGN KEY (run_id) REFERENCES runs(run_id)
);
CREATE INDEX IF NOT EXISTS idx_results_run_id ON results(run_id);
CREATE INDEX IF NOT EXISTS idx_results_run_seq ON results(run_seq);
CREATE INDEX IF NOT EXISTS idx_results_outcome ON results(outcome);
CREATE INDEX IF NOT EXISTS idx_results_file_subject ON results(file, subject_name);
CREATE INDEX IF NOT EXISTS idx_results_tool ON results(tool);
CREATE TABLE IF NOT EXISTS extras (
run_id TEXT NOT NULL,
key TEXT NOT NULL,
value TEXT NOT NULL,
PRIMARY KEY (run_id, key),
FOREIGN KEY (run_id) REFERENCES runs(run_id)
);
CREATE TABLE IF NOT EXISTS dependencies (
source_file TEXT NOT NULL,
test_file TEXT NOT NULL,
PRIMARY KEY (source_file, test_file)
);
CREATE INDEX IF NOT EXISTS idx_dependencies_test ON dependencies(test_file);
CREATE TABLE IF NOT EXISTS hashes (
file TEXT PRIMARY KEY,
hash INTEGER NOT NULL
);
runs¶
One row per Scrutin invocation. Holds all run-level provenance so results rows don't repeat it.
| Column | Source | Notes |
|---|---|---|
run_id |
scrutin-generated (UUID) | primary key, referenced by all child tables |
timestamp |
chrono::Utc::now().to_rfc3339() at run start |
ISO 8601 |
hostname |
hostname command (unix) / %COMPUTERNAME% (windows) |
|
ci |
github / gitlab / buildkite / circleci / jenkins / azure-pipelines / travis / ci / NULL |
|
scrutin_version |
CARGO_PKG_VERSION |
version of the Scrutin binary that wrote this row |
git_commit |
git rev-parse HEAD |
full 40-char SHA |
git_branch |
git rev-parse --abbrev-ref HEAD, or (detached) |
|
git_dirty |
git status --porcelain non-empty |
1 = uncommitted changes, 0 = clean |
repo_name |
derived from remote.origin.url, fallback to repo directory name |
e.g. vincentarelbundock/scrutin |
repo_url |
git config --get remote.origin.url |
NULL on repos with no origin remote |
repo_root |
absolute path returned by git rev-parse --show-toplevel |
disambiguates multiple checkouts on the same host |
build_number |
CI env var | TEXT, not INTEGER: Azure emits "20260414.3" |
build_id |
CI env var | opaque stable identifier for the CI run |
build_name |
CI env var | workflow / job name |
build_url |
CI env var or derived | link to the CI run |
os_platform |
std::env::consts::OS |
linux / macos / windows |
os_release |
uname -r equivalent |
|
os_version |
sw_vers / /etc/os-release / registry |
|
os_arch |
std::env::consts::ARCH |
x86_64 / aarch64 / ... |
CI env var sources for build_*:
| Provider | build_number |
build_id |
build_name |
build_url |
|---|---|---|---|---|
| GitHub | GITHUB_RUN_NUMBER |
GITHUB_RUN_ID |
GITHUB_WORKFLOW |
derived |
| GitLab | CI_PIPELINE_IID |
CI_PIPELINE_ID |
CI_JOB_NAME |
CI_PIPELINE_URL |
| Buildkite | BUILDKITE_BUILD_NUMBER |
BUILDKITE_BUILD_ID |
BUILDKITE_PIPELINE_SLUG |
BUILDKITE_BUILD_URL |
| CircleCI | CIRCLE_BUILD_NUM |
CIRCLE_WORKFLOW_ID |
CIRCLE_JOB |
CIRCLE_BUILD_URL |
| Jenkins | BUILD_NUMBER |
BUILD_ID |
JOB_NAME |
BUILD_URL |
| Azure Pipelines | BUILD_BUILDNUMBER |
BUILD_BUILDID |
BUILD_DEFINITIONNAME |
derived |
| Travis | TRAVIS_BUILD_NUMBER |
TRAVIS_BUILD_ID |
TRAVIS_REPO_SLUG |
TRAVIS_BUILD_WEB_URL |
| Local | NULL | NULL | NULL | NULL |
results¶
One row per (run, file, subject). The fact table. Tool metadata (tool, tool_version, app_name, app_version) is denormalized here rather than stored in a separate tools table; at Scrutin's scale the repetition is negligible and queries stay flat.
| Column | Source | Notes |
|---|---|---|
run_id |
FK to runs(run_id) |
groups all rows from one invocation |
run_seq |
monotonic counter per run | stable ordering within a run |
file |
test file path, repo-relative | e.g. tests/testthat/test-math.R |
tool |
Package::suite_for(file), plugin identifier |
testthat, tinytest, pytest, jarl, ruff, ... |
tool_version |
captured by the plugin at run start | e.g. packageVersion("testthat") / pytest.__version__ |
app_name |
Package: from DESCRIPTION or [project] name from pyproject.toml |
varies per tool in multi-language projects |
app_version |
Version: from DESCRIPTION or [project] version |
same source as app_name |
subject_kind |
Subject::kind |
file / test / expectation |
subject_name |
Subject::name |
e.g. "addition works" |
subject_parent |
Subject::parent |
optional parent describe / test name |
outcome |
six-value taxonomy | pass / fail / error / skip / xfail / warn |
duration_ms |
from summary NDJSON message |
file-level authoritative wall time |
retries |
count of re-executions within the run | 0 = first attempt succeeded |
total |
data-validation plugins | total expectation count |
failed |
data-validation plugins | failed expectation count |
fraction |
data-validation plugins | (total - failed) / total |
extras¶
User-supplied key/value labels, populated from [extras] in .scrutin/config.toml and --set extras.KEY=VALUE on the CLI. The only table where key names are not known at schema time. Everything Scrutin captures itself has a dedicated column on runs or results. Typical contents:
[extras]
environment = "staging"
experiment = "new_parser"
feature_flag = "async_pool"
reviewer = "vincent"
dependencies¶
Source-to-test dependency edges, one row per edge. Each row says "if source file X changes, test file Y is affected." Populated at run start by the dep-map builder and consulted by watch mode to pick affected test files on each save. Has no run_id: the dep map is a project-wide cache, not per-run state.
Writes are full-replace per test_file: when a test file's imports change, Scrutin deletes all existing rows where test_file = ? and inserts fresh ones in one transaction.
hashes¶
Per-file content fingerprints (xxhash_rust::xxh64) used to decide whether the dep map is stale. The u64 hash is stored via an as i64 cast (bit pattern preserved, equality checks work correctly). No arithmetic is ever performed on these values.
Typical queries¶
History of a single file, ordered by time:
SELECT r.timestamp, r.git_commit, res.outcome, res.duration_ms
FROM results res JOIN runs r USING (run_id)
WHERE res.file = 'tests/testthat/test-math.R'
ORDER BY r.timestamp DESC;
Flaky tests over the last 10 runs (scrutin stats runs a version of this query; the embedded copy is reproduced below):
WITH recent AS (
SELECT run_id FROM runs ORDER BY timestamp DESC LIMIT ?1
),
stats AS (
SELECT file, subject_name,
SUM(CASE WHEN outcome IN ('fail','error') THEN 1 ELSE 0 END) AS failures,
SUM(CASE WHEN retries > 0 AND outcome = 'pass' THEN 1 ELSE 0 END) AS retry_passes,
COUNT(*) AS total
FROM results
WHERE run_id IN (SELECT run_id FROM recent) AND subject_name != ''
GROUP BY file, subject_name
)
SELECT file, subject_name, failures, retry_passes, total
FROM stats
WHERE total >= ?2
AND ((failures > 0 AND failures < total) OR retry_passes > 0)
Slow tests over all recorded runs:
SELECT file, subject_name,
AVG(duration_ms) AS avg_ms,
MAX(duration_ms) AS max_ms,
COUNT(*) AS runs
FROM results
WHERE subject_name != '' AND duration_ms > 0
GROUP BY file, subject_name
HAVING runs >= ?1 AND avg_ms > ?2
ORDER BY avg_ms DESC
LIMIT ?3
Tests that tend to need retries:
SELECT file, subject_name, AVG(retries) AS avg_retries, COUNT(*) AS runs
FROM results
WHERE subject_kind = 'file'
GROUP BY file, subject_name
HAVING avg_retries > 0
ORDER BY avg_retries DESC;
Tool-version matrix (when did each tool version run?):
SELECT tool, tool_version, MIN(r.timestamp) AS first_seen, MAX(r.timestamp) AS last_seen
FROM results res JOIN runs r USING (run_id)
GROUP BY tool, tool_version
ORDER BY tool, first_seen;
Runs tagged with an experiment label:
SELECT r.timestamp, r.git_commit, e.value AS experiment
FROM runs r JOIN extras e USING (run_id)
WHERE e.key = 'experiment' AND e.value = 'new_parser'
ORDER BY r.timestamp DESC;
Dep-map lookup (which test files does this source file affect?):
What is not in this DB¶
- Per-attempt retry detail (attempt N's duration, failure message, trace): not persisted.
results.retriesgives the count; cross-invocation aggregation overresultscovers the flaky-test question. - Computed insights (pass rate, flaky rate, p95 duration, baseline deltas): not materialized. Computed on demand by
scrutin stats. - Schema version: no
schemastable. On schema change during development, delete.scrutin/state.dband re-run.