Targets Configuration
Targets define which agent or LLM provider to evaluate. They are configured in .agentv/targets.yaml to decouple eval files from provider details.
Structure
Section titled “Structure”targets: - label: azure-base provider: azure config: endpoint: ${{ AZURE_OPENAI_ENDPOINT }} api_key: ${{ AZURE_OPENAI_API_KEY }} model: ${{ AZURE_DEPLOYMENT_NAME }}
- label: vscode_dev provider: vscode grader_target: azure-base
- label: local_agent provider: cli config: command: 'python agent.py --prompt {PROMPT}' grader_target: azure-baseUse label for AgentV target references and comparison names. Use id only when you need to carry a promptfoo provider/backend identifier. The
provider field selects the backend kind. Provider-specific settings belong in
config; AgentV target extensions such as grader_target, use_target,
fallback_targets, workers, and batch_requests remain top-level fields on
the target object.
Environment Variables
Section titled “Environment Variables”Use ${{ VARIABLE_NAME }} syntax to reference values from your environment. AgentV reads
exported process environment variables directly, and it also loads .env files from the
eval directory hierarchy when present:
targets: - label: my_target provider: anthropic config: api_key: ${{ ANTHROPIC_API_KEY }} model: ${{ ANTHROPIC_MODEL }}This keeps secrets out of version-controlled files and avoids requiring a CI step that rewrites
already-exported secrets into .env.
Supported Providers
Section titled “Supported Providers”| Provider | Type | Description |
|---|---|---|
azure | LLM | Azure OpenAI |
anthropic | LLM | Anthropic Claude API |
gemini | LLM | Google Gemini |
claude | Agent | Claude Agent SDK |
codex | Agent | Codex CLI |
pi-coding-agent | Agent | Pi Coding Agent |
vscode | Agent | VS Code with Copilot |
vscode-insiders | Agent | VS Code Insiders |
cli | Agent | Any CLI command — see CLI Provider |
mock | Testing | Explicit mock target for examples and tests |
Referencing Targets in Evals
Section titled “Referencing Targets in Evals”Select the system under test with top-level target or CLI --target.
Test cases do not choose targets; split target-specific cases into separate eval
suites, select them with tags/filters, or run the same eval with different
--target values.
target: azure-base
tests: - id: test-1 - id: test-2The string is a target label from .agentv/targets.yaml or targets.yaml.
Use object form when an eval needs a local target variant:
target: extends: azure-base label: azure-high-reasoning model: gpt-5.4 reasoning_effort: highextends names the base target label. label names the eval-local variant for
results and comparison. If label is omitted, the eval overrides the base target
under the same label. If extends is omitted, the object is a complete inline
target definition and must include enough provider configuration to run.
Grader Target
Section titled “Grader Target”Agent targets that need LLM-based evaluation specify a grader_target — the LLM used to run LLM grader graders:
targets: - label: codex_target provider: codex grader_target: azure-base # LLM used for gradingLifecycle Extensions
Section titled “Lifecycle Extensions”Run non-provisioning setup at Promptfoo-compatible lifecycle points using
top-level extensions. The harness materializes workspace.template and
workspace.repos first, then runs beforeAll extensions. Use extensions for
dependency installs, builds, fixture generation, and agent-rule staging. Use
target hooks for runner-specific setup. Keep repo identity and checkout pins in
workspace.repos; extensions must not become the default repo acquisition path.
extensions: - file://scripts/workspace.mjs:beforeAll - file://scripts/workspace.mjs:beforeEach - file://scripts/workspace.mjs:afterEach - file://scripts/workspace.mjs:afterAll - id: agentv:agent-rules hook: beforeAll skills: agent-rules/skills rules: agent-rules/AGENTS.md
workspace: template: ./workspace-templates/my-project hooks: after_each: reset: fast| Field | Description |
|---|---|
template | Directory to copy as workspace |
extensions[] | file://...:beforeAll, beforeEach, afterEach, afterAll, or agentv:agent-rules |
hooks.after_each.reset | Reset mode: none, fast, strict |
Lifecycle order: template copy → repo materialization → extensions.beforeAll → target hooks.before_all → git baseline → (extensions.beforeEach → target hooks.before_each → agent runs → file changes captured → target hooks.after_each → extensions.afterEach → workspace.hooks.after_each.reset) × N tests → target hooks.after_all → extensions.afterAll → cleanup
Shared workspace: The workspace is created once and shared across all tests in a suite. Use hooks.after_each.reset to reset state between tests (e.g., fast/strict).
Error handling:
beforeAll/beforeEachextension failure aborts the affected run with an error resultafterAll/afterEachextension failure is non-fatal
File hook context: Exported functions receive a JSON-compatible object with case context:
{ "workspace_path": "/home/user/.agentv/workspaces/run-123/case-01", "test_id": "case-01", "eval_run_id": "run-123", "case_input": "Fix the bug", "case_metadata": { "repo": "sympy/sympy", "base_commit": "abc123" }}workspace.hooks remains the reset-policy home for after_each.reset. Legacy
command hooks still parse for existing local suites, but new portable evals
should use extensions for executable setup.
Repository Lifecycle
Section titled “Repository Lifecycle”Materialize git repositories into the shared eval workspace. Repo entries declare provenance only: the repository identity and checkout pin. AgentV resolves acquisition separately using registered projects, configured mirrors, its git cache, and finally remote clone. Define repos at the suite level or per test:
workspace: repos: - path: ./my-repo repo: https://github.com/org/repo.git commit: main ancestor: 1 # check out the parent commit hooks: after_each: reset: fast # none | fast | strict scope: suite # suite (default) | attemptrepo declares the repository identity. Acquisition is harness-owned: AgentV first applies configured repo_resolvers, then uses the built-in git path of registered projects, configured mirrors, AgentV’s git cache, and remote clone. See Workspace Architecture for the resolver order, command resolver protocol, and git_cache.mirrors config.
| Field | Description |
|---|---|
repos[].path | Directory within the workspace to clone into |
repos[].repo | Repository identity: full clone URL or GitHub org/name shorthand |
repos[].commit | Branch, tag, or SHA to check out (default: HEAD) |
repos[].base_commit | Alias for commit, useful for SWE-bench-style datasets |
repos[].ancestor | Walk N commits back from the checked-out ref (e.g., 1 for parent) |
repos[].sparse | Sparse checkout paths |
hooks.after_each.reset | Reset policy after each test: none, fast, strict |
scope | suite reuses one harness-managed workspace for the suite; attempt creates a clean workspace for each resolved execution attempt |
hooks.enabled | Boolean (default: true). Set false to skip all lifecycle hooks. |
Use scope: attempt when mutating agents need clean filesystem state for every prompt-target-test-repeat execution. Use scope: suite when the suite intentionally shares state across tests.
Existing local workspaces: do not commit local paths in eval YAML. Use --workspace-path /path/to/workspace for a one-off run, or put execution.workspace_path in .agentv/config.local.yaml.
Workspace command:
agentv workspace deps <eval-paths>— scan eval files and output a JSON manifest of required git repos (for CI pre-cloning)
Common patterns:
# Pinned commitworkspace: repos: - path: ./repo repo: https://github.com/org/repo.git commit: abc123def
# Multi-repo shared workspace with resetworkspace: repos: - path: ./frontend repo: https://github.com/org/frontend.git - path: ./backend repo: https://github.com/org/backend.git hooks: after_each: reset: fast
# GitHub shorthand with a base_commit aliasworkspace: repos: - path: ./repo repo: org/repo base_commit: abc123defCleanup Behavior
Section titled “Cleanup Behavior”Default finish behavior:
- Success: cleanup
- Failure: keep
CLI overrides:
--retain-on-success keep|cleanup--retain-on-failure keep|cleanup
Use cwd on a target to run in an existing directory (shared across tests). If not set, the eval file’s directory is used as the working directory.
Target Hooks
Section titled “Target Hooks”Eval files can define per-target hooks that run setup/teardown scripts to customize the workspace for each target variant. This enables comparing different harness configurations (e.g., baseline vs with-plugins) in a single eval file.
Targets do not declare repos. Repositories belong to the shared eval workspace so every target runs in the same world; target hooks customize the harness under evaluation. Use hooks for per-target setup such as enabling wrappers or changing provider-local config. Keep installs, builds, fixture generation, and case setup in top-level lifecycle extensions.
Target hooks can be scoped to an eval-local target object:
target: extends: default hooks: before_each: command: ["setup-plugins.sh", "skills"]Hook execution order
Section titled “Hook execution order”Target hooks run after workspace hooks on setup, before workspace hooks on teardown:
- Extension
beforeAll - Target
before_all - For each test:
- Workspace
before_each - Target
before_each - Test executes
- Target
after_each - Workspace
after_each
- Workspace
- Target
after_all - Workspace
after_all
Hook schema
Section titled “Hook schema”Target hooks follow the same schema as workspace hooks:
hooks: before_all: command: ["setup.sh"] # Command array or shell string timeout_ms: 60000 # Optional timeout cwd: "./scripts" # Optional working directory before_each: command: "echo setup" # String shorthand (runs via sh -c) after_each: command: ["cleanup.sh"] after_all: command: ["teardown.sh"]