llmagent_test.go

Overview

This file contains comprehensive tests for the llmagent package, which implements AI agents integrating with large language models (LLMs) following the patterns described in LLM Integration and Agents. The tests validate the behavior of the LLMAgent through various scenarios including core agent invocation, streaming modes, model and tool callbacks, instruction handling, tool execution, and agent transfer workflows.

The tests ensure the correctness of:

Basic agent invocation with healthy and failing backends.
Streaming LLM responses with Server-Sent Events (SSE).
Before and after callbacks on model invocations to modify or short-circuit responses.
Tool invocation lifecycle with before and after callbacks and their impact on results.
Dynamic instruction and global instruction providers overriding or merging prompt instructions.
Function tool integration and argument/result validation.
Agent transfer mechanisms supporting sub-agent delegation and hierarchical workflows.

The file uses mocking, HTTP transport recording, and utility test runners (testutil.NewTestAgentRunner) to simulate realistic agent invocations and validate output event streams.

Detailed Explanations

Constants and Helpers

modelName - The default model identifier string "gemini-2.0-flash" used for test model instantiation.
roundTripperFunc
A helper type implementing http.RoundTripper interface as a function adapter to mock HTTP transport behavior for backend simulation.
newGeminiModel(t, modelName, transport)
Creates a Gemini model instance for testing. It optionally uses HTTP transport with recorded sessions or mocks. Returns a model.LLM instance configured for test usage.
newGeminiTestClientConfig(t, rrfile)
Returns a Gemini HTTP transport configured to replay or record HTTP request-response pairs for deterministic testing.

Test Functions

TestLLMAgent(t *testing.T)

Tests basic instantiation and invocation of an LLMAgent with different backend conditions:

Iterates over test cases with:
- healthy_backend: Uses default HTTP transport simulating a healthy backend.
- broken_backend: Uses a transport that always returns a network error.
For each case:
- Creates a Gemini model via newGeminiModel.
- Instantiates an LLMAgent with configurations such as name, description, model, instructions, and transfer restrictions.
- Runs the agent using testutil.NewTestAgentRunner.
- Collects the text parts from the streamed response.
- Asserts expected errors or successful single text responses.

This test validates that the agent can handle normal and error conditions gracefully.

TestLLMAgentStreamingModeSSE(t *testing.T)

Validates the agent's behavior in streaming mode with SSE (Server-Sent Events):

Creates an LLMAgent with ThinkingConfig.IncludeThoughts enabled to allow multi-message streaming.
Invokes the agent with a user query: "What is the sum of the first 50 prime numbers?" using streaming mode agent.StreamingModeSSE.
Collects events from the stream.
Verifies:
- Multiple content events are received (more than one).
- At least one event contains a thought part (indicating intermediate reasoning).
- No errors occur during streaming.

This tests the integration of streaming LLM responses and thought inclusion.

TestModelCallbacks(t *testing.T)

Tests the behavior of before and after model callbacks that can modify or short-circuit LLM requests and responses:

Defines multiple subtests with combinations of:
- beforeModelCallbacks that return:
  - No modifications.
  - Errors to short-circuit request.
  - New LLMResponse to override model output.
  - Both new response and error.
- afterModelCallbacks that return:
  - No modifications.
  - New LLMResponse to override response.
  - Errors to propagate.
  - Both response and error.
Uses a mock model (testutil.MockModel) returning canned LLM responses.
Validates that callback chains honor the first non-nil response/error, skipping further callbacks.
Compares collected output texts to expected values.
Checks error propagation correctness.

This test ensures the callback mechanism in the agent invocation lifecycle works as intended, supporting flexible response customization.

TestToolCallback(t *testing.T)

Verifies the lifecycle of tool invocation callbacks and their effect on tool execution and results:

Defines a function tool "rand_number" that returns a fixed number (1) regardless of input.
Runs subtests for:
- BeforeToolCallbacks that return nil or override tool arguments with a fixed "number".
- AfterToolCallbacks that return nil or override tool results similarly.
- Cases where multiple callbacks are provided; only the first non-nil override is applied.
- Combination of before and after callbacks to test precedence.
- Case where both callbacks return nil, so actual tool execution occurs.
Runs the agent with these callbacks and verifies final output text matches expected overridden or actual tool results.

This confirms the agent properly integrates tool callbacks and respects their short-circuiting behavior.

TestInstructionProvider(t *testing.T)

Tests dynamic instruction and global instruction providers replacing or merging static instructions:

Defines multiple test cases with:
- Static instruction evaluation with placeholders replaced by session state.
- Instruction provider function returning a template string (no evaluation).
- Global instruction provider overriding global instruction.
- Both instruction and global instruction providers returning templates merged into combined system instruction content.
Uses a mock LLM that returns a fixed response "llm resp stub".
Validates that the LLM requests sent to the underlying model contain the expected system instructions.
Validates agent's final response correctness.

This test validates the integration with the dynamic instruction injection system as described in Instruction Template Processing and Instruction Injection.

TestFunctionTool(t *testing.T)

Tests integration of a function tool that computes the sum of two numbers:

Defines a function tool "sum" that adds two integers, validating input arguments.
Creates an LLMAgent with this tool and instructions to output only the computed result.
Runs the agent with a prompt asking for the sum of 1 + 2.
Collects the output and asserts it equals "3".

This test demonstrates the usage of function tools within the agent framework, as outlined in Function Tools.

TestAgentTransfer(t *testing.T)

Tests agent transfer workflows between hierarchical agents and sub-agents:

Defines helper functions to create transfer and text content events.
Creates mock models that return prepopulated content sequences simulating transfer function calls and text responses.
Defines sub-agents and root agents with transfer permissions configured.
Runs multiple scenarios:
- auto_to_auto: Root agent transfers to a sub-agent with automatic transfer allowed.
- auto_to_single: Root agent transfers to a sub-agent that disallows transfer to parent or peers.
- auto_to_auto_to_single: Nested sub-agent delegation with mixed transfer policies.
Validates event streams after multiple rounds to ensure transfers and responses occur as expected.
Checks that the current active agent after transfer respects disallow transfer flags and conversation continuity.

This test covers hierarchical agent delegation and transfer logic as described in Agent Lifecycle and Callbacks and Agent Selection Logic.

Important Implementation Details

Uses the testutil.NewTestAgentRunner helper extensively to simulate agent execution in test contexts, abstracting session and invocation management.
Employs HTTP request-response recording and replay (httprr package) to isolate Gemini model backend interactions.
Callback chains (before/after model and tool callbacks) are tested for proper short-circuiting and error propagation.
Tests verify both standard and streaming invocation modes, including SSE streaming with intermediate "thoughts".
Agent transfer tests simulate real multi-agent conversations with transfer function calls embedded in model-generated content.
Mock models and function tools are used to isolate and test specific agent behaviors without external dependencies.
Session state injection and instruction providers are exercised to validate dynamic prompt generation.

File Interactions with Other System Parts

LLMAgent (from llmagent package) is the primary subject of testing, integrating with the LLM Integration and Agents framework.
Tests use the Gemini model implementation (gemini package) to simulate LLM backends.
testutil package provides mock models and agent runners for controlled test execution.
agent and session packages manage agent lifecycle and event streaming during test runs.
tool and functiontool packages simulate tool invocation and wrapping Go functions as callable tools.
HTTP transport mocking and recording (httprr) isolate network interactions.
Tests indirectly validate instruction template processing via instruction providers (Instruction Template Processing).

Functions and Test Descriptions

TestLLMAgent

Purpose: Validate basic agent instantiation and invocation with normal and error HTTP backends.
Parameters: t *testing.T - test runner.
Behavior: Runs agent with default and error-inducing HTTP transports, verifies expected outputs or errors.
Usage: Base test ensuring agent can handle backends and produce expected responses.

TestLLMAgentStreamingModeSSE

Purpose: Verify SSE streaming mode with intermediate thought messages.
Parameters: t *testing.T
Behavior: Runs agent with SSE streaming mode, ensures multiple content events and thought parts appear.
Usage: Tests real-time streaming response capabilities.

TestModelCallbacks

Purpose: Test before/after model callbacks modifying or intercepting LLM requests/responses.
Parameters: t *testing.T
Behavior: Runs multiple subtests with various callback return patterns, validating output and error propagation.
Usage: Ensures flexible and correct callback chaining.

TestToolCallback

Purpose: Test before/after tool callbacks affecting tool input/output.
Parameters: t *testing.T
Behavior: Runs subtests with callbacks overriding tool args/results, verifies final output correctness.
Usage: Validates tool lifecycle interception.

TestInstructionProvider

Purpose: Test dynamic instruction and global instruction providers overriding or merging instructions.
Parameters: t *testing.T
Behavior: Runs subtests with different instruction provider configurations, verifies LLM request content.
Usage: Ensures instruction injection system functions properly.

TestFunctionTool

Purpose: Validate integration of function tools performing computations.
Parameters: t *testing.T
Behavior: Runs agent with sum tool, verifies correct calculation and output.
Usage: Demonstrates tool invocation and argument parsing.

TestAgentTransfer

Purpose: Test agent transfer and sub-agent delegation workflows.
Parameters: t *testing.T
Behavior: Simulates transfer calls and responses between hierarchical agents, validates event streams.
Usage: Ensures correct multi-agent transfer handling.

newGeminiModel

Purpose: Helper to create Gemini LLM model instances for tests.
Parameters: t *testing.T, modelName string, transport http.RoundTripper
Returns: model.LLM
Behavior: Configures Gemini model with optional HTTP transport, handles API key for recording mode.

newGeminiTestClientConfig

Purpose: Helper to create HTTP transport for Gemini model using recorded HTTP sessions.
Parameters: t *testing.T, rrfile string
Returns: (http.RoundTripper, bool) - transport and recording flag.
Behavior: Loads recorded HTTP interactions or sets recording mode.

Mermaid Diagram: Flowchart of Main Test Functions and Their Relationships

flowchart TD
A[TestLLMAgent] -->|Uses| B[newGeminiModel]
C[TestLLMAgentStreamingModeSSE] -->|Uses| B
D[TestModelCallbacks] --> E[testutil.MockModel]
D --> F[testutil.NewTestAgentRunner]
G[TestToolCallback] --> H[functiontool.New]
G --> F
I[TestInstructionProvider] --> E
I --> F
J[TestFunctionTool] --> H
J --> F
K[TestAgentTransfer] --> L[testutil.MockModel]
K --> F
B --> M[newGeminiTestClientConfig]
classDef helper fill:none,stroke:none
class B,M helper

Usage Examples from Tests

// Create agent with Gemini model and basic instructions
model := newGeminiModel(t, modelName, nil)
agent, err := llmagent.New(llmagent.Config{
    Name:        "hello_world_agent",
    Description: "hello world agent",
    Model:       model,
    Instruction: "Roll the dice and report only the result.",
    GlobalInstruction: "Answer as precisely as possible.",
    DisallowTransferToParent: true,
    DisallowTransferToPeers:  true,
})
if err != nil {
    t.Fatalf("NewLLMAgent failed: %v", err)
}
runner := testutil.NewTestAgentRunner(t, agent)
stream := runner.Run(t, "test_session", "")
texts, err := testutil.CollectTextParts(stream)

// Function tool example: sum of two numbers
type Args struct { A, B int }
type Result struct { Sum int }
handler := func(_ tool.Context, input Args) (Result, error) {
    return Result{Sum: input.A + input.B}, nil
}
sumTool, _ := functiontool.New(functiontool.Config{
    Name: "sum",
    Description: "computes the sum of two numbers",
}, handler)

agent, err := llmagent.New(llmagent.Config{
    Name: "agent",
    Model: model,
    Instruction: "output ONLY the result computed by the provided function",
    Tools: []tool.Tool{sumTool},
})

References to Related Topics

See LLM Integration and Agents for the core concepts of LLMAgent and its configuration.
Consult LLM Agent Configuration for details on agent setup including tools and callbacks.
Review Instruction Template Processing and Instruction Injection for dynamic instruction handling.
For tool lifecycle and function tools, see Tooling System and Function Tools.
Agent transfer and sub-agent delegation are explained in Agent Lifecycle and Callbacks and Agent Selection Logic.
Session and event streaming integration relates to Session Management and Agent Execution Runner.