Test Bed

Live chat environment to test your deployed agentic system.

The Test Bed is a live chat environment to test your deployed agentic system. It connects directly to your AgentCore runtime.

Layout

Resizable two-panel view -- the left panel is the chat interface, the right panel shows the Agent Activity Tree.

How It Works

Type a message to interact with your deployed agents. The Agent Activity Tree on the right panel shows real-time sub-agent orchestration:

  • Each sub-agent's status: pending → running → done/error
  • Tool usage per agent (name, input, result, status)
  • Agent reasoning and thinking content
  • Color-coded per agent (violet, sky, amber, emerald, rose, cyan)

Agent responses render as Markdown with GFM support. S3 report links are auto-detected in responses and shown as downloadable cards.

Evaluation

After testing, select the Evaluate button to choose up to 8 metrics across multiple dimensions:

Response Quality

  • Correctness -- Are the facts accurate?
  • Helpfulness -- Is the response useful?
  • Coherence -- Does it flow logically?
  • Conciseness -- Is it brief but complete?
  • Faithfulness -- Is it backed by context?
  • Relevance -- Does it answer the question?
  • Instruction Following -- Follows the instructions?
  • Refusal Detection -- Did it avoid answering?
  • Context Relevance -- Was the right context used?

Task Completion

  • Goal Success -- Did it achieve the goal?

Tool Usage

  • Tool Selection -- Picked the right tool?
  • Tool Parameters -- Used correct parameters?

Safety

  • Harmfulness -- Contains harmful content?
  • Stereotyping -- Makes generalizations?

Once you've selected your metrics, select the Evaluate button to start the evaluation. As the evaluation completes, you can review the scores for each selected metric along with detailed explanations and labels.

Session Phases

The test session progresses through: idle → testing → ended → ready → evaluated.