Deterministic
Typical: No — LLM wording varies
Trajectly: Yes — fixture replay + normalized comparison

Trajectly is deterministic regression testing for AI agents. Record a baseline, enforce behavioral contracts, and catch silent failures — with an exact witness step, violation code, and one-command repro.
pip install trajectly
Six categories of silent failure that correct-looking output can hide.
Skipped approval? The answer won’t tell you. The trajectory will.
Invite sent before the room was booked. Trajectly enforces ordering.
Clean summary. Leaked API key in the payload. Trajectly catches it.
Success reported. Blocked domain contacted. Trajectly denies it.
Tool call succeeded. Token format broke. Trajectly validates args.
Same output. Twice the cost. Trajectly gates execution budgets.
The same agent run. On the left, what you see. On the right, what Trajectly sees.
“Purchase order created. Vendor: Acme Corp. Total: $14,200. Delivery: March 20.”
Looks correctfetch_requisition fetch_vendor_quotes create_purchase_order ← route_for_approval skippedFAIL — witness 6
Run your agent once. Trajectly captures every tool call and LLM response as a fixture for deterministic replay.
Re-run the agent. LLM responses replay from fixtures — zero API cost. Trajectly checks behavioral refinement and contracts.
PASS or FAIL with the exact witness step, violation code, minimal counterexample, and a one-command deterministic repro.
Typical: No — LLM wording varies
Trajectly: Yes — fixture replay + normalized comparison
Typical: “Output changed”
Trajectly: Exact witness step + violation code
Typical: None
Trajectly: Tool allow/deny, sequence, budget, data safety
Typical: Rarely
Trajectly: Always — trajectly repro
Typical: No
Trajectly: trajectly shrink — shortest failing trace
Typical: Fragile
Trajectly: Exit code 0/1, PR comments, artifacts
| Typical approach | Trajectly | |
|---|---|---|
| Deterministic | No — LLM wording varies | Yes — fixture replay + normalized comparison |
| Root cause | “Output changed” | Exact witness step + violation code |
| Contracts | None | Tool allow/deny, sequence, budget, data safety |
| Reproducible | Rarely | Always — trajectly repro |
| Minimizable | No | trajectly shrink — shortest failing trace |
| CI-ready | Fragile | Exit code 0/1, PR comments, artifacts |
Works in any CI. The GitHub Action is a thin wrapper — all logic lives in the CLI.
# GitHub Actions
- uses: trajectly/trajectly-action@v1.0.2
with:
spec_glob: "specs/challenges/*.agent.yaml"
project_root: "."
comment_pr: "true"git clone https://github.com/trajectly/trajectly-survival-arena.git cd trajectly-survival-arena pip install -r requirements.txt python -m trajectly init python -m trajectly run specs/challenges/procurement-chaos.agent.yaml --project-root . python -m trajectly report python -m trajectly repro python -m trajectly shrink
Free to use, modify, and distribute.
Python 3.11+, PyYAML, Typer.
Store interfaces, spec inheritance, and SDK adapters for any LLM provider.