Evaluations
Run evaluations
What you will do
Run an eval suite, write artifacts, and compare variants.
Dry run
memory eval run --suite evals/examples/memory-smoke --condition full-memory --profile offline --dry-runPaired run shape
memory eval run --suite evals/suites/memory-improvement-v1 --condition no-memory --condition full-memory --allow-shell --repeat 5
memory eval compare --baseline 'target/memory-evals/*no-memory*.json' --candidate 'target/memory-evals/*full-memory*.json' --textUse --allow-shell only after reviewing suite scripts and fixtures. Shell-executing evals are code execution inputs, not passive data files.
External retrievers
memory eval run --suite evals/suites/memory-improvement-v1 --condition full-memory --retriever-cmd './my-retriever' --allow-shellVerify
Inspect the JSON artifacts under target/memory-evals/ and compare item-level results before making claims.
Next
Read Interpreting results and Limitations.