The answer looked fine.
The behavior wasn’t.

Trajectly is deterministic regression testing for AI agents. Record a baseline, enforce behavioral contracts, and catch silent failures — with an exact witness step, violation code, and one-command repro.

pip install trajectly
trajectly report
FAIL procurement-chaos
Violation REFINEMENT_BASELINE_CALL_MISSING
Witness event 6 — route_for_approval expected but missing
Shrink 14 events → 3
Repro python -m trajectly repro procurement-chaos
Apache 2.0Python 3.11+Zero LLM cost in CI

What Trajectly catches

Six categories of silent failure that correct-looking output can hide.

Looks fine vs. actually died

The same agent run. On the left, what you see. On the right, what Trajectly sees.

What you see

“Purchase order created. Vendor: Acme Corp. Total: $14,200. Delivery: March 20.”

Looks correct
What Trajectly sees
fetch_requisition
fetch_vendor_quotes
create_purchase_order   ← route_for_approval skipped
FAIL — witness 6

How it works

1

Record a baseline

Run your agent once. Trajectly captures every tool call and LLM response as a fixture for deterministic replay.

2

Run against contracts

Re-run the agent. LLM responses replay from fixtures — zero API cost. Trajectly checks behavioral refinement and contracts.

3

Get a verdict

PASS or FAIL with the exact witness step, violation code, minimal counterexample, and a one-command deterministic repro.

Typical approach vs. Trajectly

Deterministic

Typical: No — LLM wording varies

Trajectly: Yes — fixture replay + normalized comparison

Root cause

Typical: “Output changed”

Trajectly: Exact witness step + violation code

Contracts

Typical: None

Trajectly: Tool allow/deny, sequence, budget, data safety

Reproducible

Typical: Rarely

Trajectly: Always — trajectly repro

Minimizable

Typical: No

Trajectly: trajectly shrink — shortest failing trace

CI-ready

Typical: Fragile

Trajectly: Exit code 0/1, PR comments, artifacts

Typical approachTrajectly
DeterministicNo — LLM wording variesYes — fixture replay + normalized comparison
Root cause“Output changed”Exact witness step + violation code
ContractsNoneTool allow/deny, sequence, budget, data safety
ReproducibleRarelyAlways — trajectly repro
MinimizableNotrajectly shrink — shortest failing trace
CI-readyFragileExit code 0/1, PR comments, artifacts

CI-native from day one

Works in any CI. The GitHub Action is a thin wrapper — all logic lives in the CLI.

# GitHub Actions
- uses: trajectly/trajectly-action@v1.0.2
  with:
    spec_glob: "specs/challenges/*.agent.yaml"
    project_root: "."
    comment_pr: "true"

Try it locally

git clone https://github.com/trajectly/trajectly-survival-arena.git
cd trajectly-survival-arena
pip install -r requirements.txt
python -m trajectly init

python -m trajectly run specs/challenges/procurement-chaos.agent.yaml --project-root .
python -m trajectly report
python -m trajectly repro
python -m trajectly shrink

Open source, CLI-first

Apache 2.0 licensed

Free to use, modify, and distribute.

Minimal dependencies

Python 3.11+, PyYAML, Typer.

Extensible

Store interfaces, spec inheritance, and SDK adapters for any LLM provider.