Why Trajectly

Six categories of silent failure

These are regressions where the final answer looks correct but the behavior is broken. Trajectly catches each one and tells you exactly where it broke.

Missing steps

An agent skips a required step but the final answer reads fine. A procurement agent that skips approval and goes straight to purchase order creation still outputs “Purchase order created.” Nothing in the text reveals the missing step.

Contract
tools:
  allow:
    - fetch_requisition
    - fetch_vendor_quotes
    - route_for_approval
    - create_purchase_order
  deny:
    - unsafe_direct_award
sequence:
  require:
    - tool:fetch_requisition
    - tool:fetch_vendor_quotes
    - tool:route_for_approval
    - tool:create_purchase_order
What Trajectly reportsREFINEMENT_BASELINE_CALL_MISSING — missing_call=route_for_approval — witness=6Arena scenarios
procurement-chaossupport-apocalypseshell-roulette

Wrong order

The right tools called in the wrong sequence. A calendar agent sends an invite before reserving the room. “Meeting arranged” sounds correct either way.

Contract
sequence:
  require:
    - tool:lookup_oncall
    - tool:reserve_room
    - tool:send_invite
  require_before:
    - before: tool:reserve_room
      after: tool:send_invite
  at_most_once:
    - tool:send_invite
What Trajectly reportsCONTRACT_SEQUENCE_REQUIRE_BEFORE_VIOLATED — expected=reserve_room before send_invite — witness=4Arena scenarios
calendar-thunderdome

Leaked secrets

The summary looks clean but the outbound tool-call payload contains a secret pattern. A log summarizer can produce a perfectly readable summary while the post_summary call body leaks an API key.

Contract
data_leak:
  outbound_kinds:
    - TOOL_CALL
  secret_patterns:
    - "sk_live_[A-Za-z0-9_]+"
What Trajectly reportsDATA_LEAK_SECRET_PATTERN — pattern=sk_live_[A-Za-z0-9_]+ — witness=4Arena scenarios
secret-karaoke

Forbidden network access

The agent reports success but quietly contacted a domain outside the allowlist. It fetched from an untrusted source and nobody noticed until production.

Contract
network:
  default: deny
  allow_domains:
    - status.internal.example
What Trajectly reportsNETWORK_DOMAIN_DENIED — witness=2Arena scenarios
network-no-fly-zone

Invalid arguments

A tool call completes but an argument silently violates its format contract. A dispatch token that should match a specific pattern instead contains a malformed value.

Contract
args:
  dispatch_war_room:
    required_keys:
      - dispatch_token
    fields:
      dispatch_token:
        type: string
        regex: "^WR-[0-9]{5}$"
What Trajectly reportsCONTRACT_ARGS_REGEX_VIOLATION — witness=6Arena scenarios
graph-chain-reaction

Budget overruns

Identical output, but execution cost quietly doubled. Twice the tool calls, twice the tokens. The final text gives no hint that cost regressed.

Contract
# In the .agent.yaml spec
budget_thresholds:
  max_tool_calls: 3
  max_tokens: 500
What Trajectly reportsbudget_breach — max_tool_calls exceededArena scenarios
budget-gauntlet

What happens when a spec fails

You don’t search through logs or guess what changed. Three tools form a complete debug loop.

Witness

Every failure pinpoints the exact trace event where behavior diverged. No log hunting — go directly to the step that broke.

witness=6 → event 6 is where route_for_approval was expected but missing

Repro

One command replays the exact failure. Deterministic — same witness, same violation, every time.

python -m trajectly repro procurement-chaos

Shrink

Reduces the failing trace to the shortest proof. Instead of reading 14 events, you read 3.

14 events → 3 events

Try it

All six failure categories are covered by the Merge or Die arena. Run the scenarios, break them, debug them.