Project Overview

Project Purpose and Objectives

This project provides a comprehensive Agent Development Kit (ADK) framework for building, running, and integrating AI agents powered by large language models (LLMs). The primary objectives are to enable the creation of sophisticated AI agents that:

Can interface with various LLM backends, such as Google's Gemini models.
Support complex agent workflows including sequential, parallel, and looping execution.
Provide extensible tooling via function-based tools, artifact loading, web search, and MCP toolsets.
Facilitate agent-to-agent communication through the A2A protocol and remote agent abstractions.
Manage session and state persistence with pluggable backends (in-memory, database, GCS).
Allow interaction via REST APIs, console, and web UI launchers.
Support telemetry and tracing for observability.
Enable injection of session state and artifact data into LLM instructions.

The system is designed for modularity and composability, allowing agents to call tools, invoke sub-agents, transfer control, and manage multi-turn conversations with memory and artifact support.

Key functionalities and implementations include:

Agent Framework: Core agent package defines agents, contexts, invocation lifecycle, callback mechanisms, and sub-agent compositions.
LLM Integration: agent/llmagent provides LLM-based agents with configurable models, tools, callbacks, and instruction providers.
Tooling: Tools are defined via interfaces in tool/, including function tools (functiontool), artifact loaders (loadartifactstool), Google Search tools (geminitool), and MCP toolsets (mcptoolset).
Session and Memory Management: Session management supports in-memory and database-backed implementations, tracking state and event history. Memory services support searching and ingesting session data.
Artifact Management: Artifacts (e.g., files, blobs) are managed via a service interface with in-memory and Google Cloud Storage (GCS) implementations.
Agent Workflows: Workflow agents like loop, sequential, and parallel agents orchestrate sub-agent execution patterns.
Remote Agent Protocol (A2A): Enables communication with agents running remotely via gRPC and HTTP with serialization between ADK and A2A formats.
Launchers: CLI and web launchers enable running agents interactively, via REST API, or through A2A endpoints.
Telemetry: Integration with OpenTelemetry for tracing LLM calls and tool executions.
Instruction Injection: Runtime injection of session state and artifact data into LLM instruction templates.

Example Workflows and Use Cases

Example: Loading and Using Artifacts in an Agent

The agent uses the LoadArtifactsTool to load artifacts (e.g., images, text files) into the session.
The tool appends instructions to the LLM request listing available artifacts.
When the LLM issues a function call to load_artifacts, the tool concurrently loads requested artifact contents from the artifact service.
The contents are appended to the LLM request, enabling the model to access them during generation.
The agent can then answer queries about the artifacts or perform actions based on their contents.

Example: Parallel Agent Execution

A ParallelAgent runs multiple sub-agents concurrently, each in isolated invocation contexts.
It collects and yields events from all sub-agents asynchronously.
Useful for combining multiple perspectives or algorithms on a single task in parallel.

Example: Agent-To-Agent Communication (A2A)

A remote agent exposes an HTTP/gRPC endpoint using the A2A protocol.
Local agents use the remoteagent package to send session events as A2A messages and receive responses.
The system converts between ADK session events and A2A events seamlessly.
Enables distributed multi-agent systems with remote delegation and invocation.

Example: Running an Agent via REST API

The RuntimeAPIController handles HTTP requests to run agents.
Incoming requests provide session identifiers and user messages.
The controller invokes the runner to execute the agent logic and returns session events as JSON or streams them via SSE.
Supports stateful multi-turn conversations with session and artifact persistence.

Stack and Technologies

Programming Language: Go (Golang) — chosen for concurrency support, static typing, and ecosystem maturity.
LLM APIs: Google Gemini models integrated through google.golang.org/genai.
Agent Framework: Custom agent abstractions with extensible tool interfaces.
Session Management: Pluggable backends including in-memory and GORM-based SQL database implementations.
Artifact Storage: In-memory and Google Cloud Storage (GCS) implementations for artifact persistence.
MCP Protocol: Integration with Model Context Protocol via mcptoolset.
A2A Protocol: Agent-to-Agent communication implemented using gRPC and HTTP.
HTTP Server: REST APIs built using gorilla/mux.
Concurrency: Uses Go's goroutines and synchronization primitives for parallel agents and artifact loading.
Testing: Extensive unit and integration tests using Go's testing framework and HTTP request recording/replay.
Telemetry: OpenTelemetry SDK for tracing and span exporting.
JSON Schema: Used to validate tool inputs/outputs and LLM responses.
CLI: Command-line interfaces built with cobra for launching agents in various modes.
Web UI: Embedded Angular-based web UI served via Go filesystem embedding.

Key libraries and frameworks:

google.golang.org/genai for AI model interaction.
github.com/google/go-cmp/cmp for testing assertions.
github.com/gorilla/mux for HTTP routing.
gorm.io/gorm for database ORM.
github.com/a2aproject/a2a-go/a2a for A2A protocol.
go.opentelemetry.io/otel for telemetry.
github.com/spf13/cobra for CLI commands.
rsc.io/omap for ordered map data structures.

High-Level Architecture

The system is composed of the following major components and their interactions:

Agents: Implemented via the agent package; represent AI-driven entities that process inputs and produce outputs by invoking LLMs and tools. Agents support sub-agent hierarchies and workflow agents.
LLM Agents: Specialized agents that interact with LLMs using tools and instructions.
Tools: Modular functionalities callable by agents or LLMs (e.g., function tools, artifact loaders, external search).
Session Service: Manages user-agent conversation sessions, storing state and event histories.
Memory Service: Provides search and ingestion of session data to inform agent context.
Artifact Service: Stores and retrieves binary or textual artifacts used by agents.
Runner: Coordinates agent execution within sessions, manages message appending and state updates.
Remote Agents (A2A): Facilitate distributed agent communication across process or network boundaries.
REST API Server: Exposes agent capabilities via HTTP endpoints.
CLI and Web Launchers: Provide interactive or programmatic interfaces to run agents.
Telemetry: Collects tracing data on LLM calls, tool executions, and agent events.

Component Diagram

graph TB
User -->|Input Message| Runner
Runner --> Agent
Agent --> LLM["LLM Model (Gemini)"]
Agent --> Toolset
Toolset --> LoadArtifacts[LoadArtifactsTool]
Toolset --> GoogleSearch[GoogleSearchTool]
Toolset --> MCPTools[MCP ToolSet]
Agent --> SessionService["(Session Service)"]
Agent --> MemoryService["(Memory Service)"]
Agent --> ArtifactService["(Artifact Service)"]
Agent --> RemoteAgent[A2A Remote Agent]
Runner --> SessionService
Runner --> ArtifactService
RESTAPI[REST API Server] --> Runner
CLI --> Runner
WebUI --> RESTAPI
Telemetry --> LLM
Telemetry --> Toolset

Developer Navigation

For Frontend Developers

Explore: cmd/launcher/web/webui/ for the embedded web UI assets and launcher.
Add UI features: Modify Angular components and styles under cmd/launcher/web/webui/distr/.
Serve API: Use the REST API (server/adkrest) to communicate with backend agents.

For Backend Developers

Agent Logic: Start with agent/agent.go and agent/llmagent/llmagent.go for core agent interfaces and LLM-based agents.
Tools: See tool/ for various tool implementations:
- functiontool for generic function wrappers.
- loadartifactstool for artifact loading.
- geminitool for Google Search tool.
- mcptoolset for MCP tool integration.
- agenttool for composing sub-agents as tools.
Session Management: session/ package for session interfaces and implementations (inmemory and database).
Artifact Management: artifact/ package for artifact service interfaces and implementations (inmemory, gcsartifact).
Runner: runner/runner.go to see how agents are executed and sessions managed.
Remote Agents: agent/remoteagent/ for A2A protocol agents and utilities.
LLM Integration: model/gemini/ for Gemini model client implementation.
Instruction Injection: internal/llminternal/instruction_processor.go and util/instructionutil/instruction.go to understand instruction template processing.
Telemetry: internal/telemetry/telemetry.go for tracing LLM calls and tool usage.
Testing: Extensive tests in internal/testutil/ and package-specific _test.go files.

For Tool Developers

Implement new tools by conforming to the tool.Tool interface in tool/tool.go.
Use functiontool to wrap Go functions as tools with JSON schema support.
Integrate tools into agents by adding them to the toolset on agent construction.
Test tools with the provided testing utilities and in-process agent runners.

For Session and Artifact Storage Developers

Extend or modify session storage via session/ package, focusing on inmemory.go and database/ for persistent storage.
Manage artifact storage via artifact/ package; implement new backends following the artifact.Service interface.
Use internal/sessioninternal/mutablesession.go for mutable session state handling.

For Agent Workflow Developers

Explore agent/workflowagents/ for loop, parallel, and sequential agents managing complex sub-agent execution.
Compose agents using sub-agent patterns, managing transfers and escalations.

Visual Diagrams

Component Diagram (See above)

Key Workflow: Agent Request Handling Flow

flowchart TD
UserInput[User Input Message]
AppendSession[Append message to Session]
DetermineAgent[Determine Agent to Run]
RunAgent[Run Agent]
AgentCallsLLM[Agent calls LLM Model]
LLMFunctionCall[LLM Function Call Detected?]
CallTool[Call Tool]
ToolResponse[Tool Response]
UpdateSession[Update Session with Events and State]
ReturnResponse[Return Response Events]
UserInput --> AppendSession --> DetermineAgent --> RunAgent --> AgentCallsLLM
AgentCallsLLM --> LLMFunctionCall
LLMFunctionCall -->|Yes| CallTool --> ToolResponse --> AgentCallsLLM
LLMFunctionCall -->|No| UpdateSession --> ReturnResponse

This flow illustrates the core step of user input processing, session appending, agent selection, LLM invocation, tool calls on function calls, session updates, and response emission.

This overview provides the essential technical structure, key components, and workflows of the project, enabling developers of varying experience levels to understand the system purpose, architecture, and contribution pathways efficiently.