Test Bed

Live chat environment to test your deployed agentic system.

The Test Bed is a live chat environment to test your deployed agentic system and observe meaningful AI in action. It connects directly to your AgentCore runtime, letting you see how your agents intelligently coordinate to solve real problems.

Layout

Resizable two-panel view -- the left panel is the chat interface, the right panel shows the Agent Activity Tree.

How It Works

Type a message to interact with your deployed agents. The Agent Activity Tree on the right panel displays real-time sub-agent orchestration, revealing the meaningful AI at work:

Each sub-agent's status: pending → running → done/error -- showing meaningful work progression across agents
Tool usage per agent (name, input, result, status) -- meaningful agents apply tools purposefully, with inputs tailored to the task at hand
Agent reasoning and thinking content -- meaningful orchestration is transparent; you observe how each agent's reasoning informs downstream actions
Color-coded per agent (violet, sky, amber, emerald, rose, cyan) -- meaningful coordination across a specialized multi-agent system

Agent responses render as Markdown with GFM support. S3 report links are auto-detected in responses and shown as downloadable cards. Meaningful outputs -- reports, decisions, structured data -- surface naturally from agent actions.

Voice Mode

If your application includes an Amazon Nova Sonic voice agent, a voice toggle appears in the Test Bed toolbar. Enable voice mode to:

Speak to your agent via microphone
Hear responses played back through your speaker
See the live transcript alongside the audio
Test voice-specific behaviors like warm transfer and escalation to human

Evaluation

Measure the meaningfulness of your agent's actions and outputs. After testing, select the Evaluate button to choose up to 8 metrics across multiple dimensions:

Response Quality -- Validates meaningful output

Correctness -- Are the facts accurate?
Helpfulness -- Is the response useful and actionable?
Coherence -- Does it flow logically?
Conciseness -- Is it brief but complete?
Faithfulness -- Is it grounded in retrieved data or agent reasoning?
Relevance -- Does it directly address the question?
Instruction Following -- Were your agent's instructions respected?
Refusal Detection -- Did it avoid answering when it shouldn't?
Context Relevance -- Did the agent use the right context to inform its action?

Task Completion -- Validates meaningful work

Goal Success -- Did the agent's actions achieve the intended goal?

Tool Usage -- Validates meaningful tool orchestration

Tool Selection -- Did the agent pick the right tool for the task?
Tool Parameters -- Did the agent structure tool inputs meaningfully?

Safety -- Ensures meaningful output stays safe

Harmfulness -- Does the output contain harmful content?
Stereotyping -- Does the output make unfair generalizations?

Once you've selected your metrics, select the Evaluate button to start. As evaluation completes, you review scores for each metric along with detailed explanations, surfacing whether your agent's actions and outputs are delivering meaningful AI.

Session Phases

The test session progresses through: idle → testing → ended → ready → evaluated.

Tool Integrations

Deploy