image_generator.go
Overview
image_generator.go implements an agent specialized in generating images using a generative AI model. This agent accepts a textual prompt, invokes a generative image model hosted on Google Cloud’s Vertex AI platform, and saves the resulting image as an artifact. The file defines the core logic for image generation, artifact saving, and the construction of an AI agent equipped with relevant tools to handle image generation requests.
This file interacts closely with the generative AI client (genai), the artifact storage system for managing image files, and the agent framework for defining and running AI agents with associated tools. It leverages the Artifact Management and Tooling System topics for artifact saving and tool creation respectively, and it is part of the broader LLM Integration and Agents system.
Types and Functions
Type: generateImageInput
type generateImageInput struct {
Prompt string `json:"prompt"`
Filename string `json:"filename"`
}
Purpose: Represents the input parameters for the image generation function.
Fields:
Prompt(string): The textual description used to generate the image.Filename(string): The filename under which the generated image will be saved as an artifact.
Usage: Serialized/deserialized automatically when the function tool receives input JSON.
Type: generateImageResult
type generateImageResult struct {
Filename string `json:"filename"`
Status string `json:"Status"`
}
Purpose: Encapsulates the output of the image generation function.
Fields:
Filename(string): The name of the saved image file.Status(string): Indicates success or failure of the image generation ("success"or"fail").
Usage: Returned as JSON to the caller of the generation function.
Function: generateImage
func generateImage(ctx tool.Context, input generateImageInput) (generateImageResult, error)
Purpose: Core function to generate an image from a prompt and save it as an artifact.
Parameters:
ctx(tool.Context): Provides context for the tool execution including access to artifact storage.input(generateImageInput): Contains the prompt and filename for the image.
Returns:
generateImageResult: Result indicating the status and filename.error: Returns an error only in unexpected situations (in practice, errors are handled internally with status flags).
Behavior:
Creates a
genaiclient configured for Google Cloud project and location, using the Vertex AI backend.Calls the
GenerateImagesmethod on the generative model"imagen-3.0-generate-002"with the prompt.Saves the first generated image to the artifact storage service with the provided filename.
Returns a success or failure status accordingly.
Error Handling: Any error in client creation, image generation, or artifact saving results in a
"fail"status with no propagated error.Example Usage:
result, err := generateImage(ctx, generateImageInput{
Prompt: "A futuristic cityscape at sunset",
Filename: "cityscape.png",
})
if err == nil && result.Status == "success" {
fmt.Println("Image saved as:", result.Filename)
}
Function: GetImageGeneratorAgent
func GetImageGeneratorAgent(ctx context.Context, model model.LLM) agent.Agent
Purpose: Constructs and returns an AI agent configured for image generation tasks.
Parameters:
ctx(context.Context): Standard Go context for lifecycle and cancellation.model(model.LLM): The large language model instance the agent will use for language understanding.
Returns: An
agent.Agentinstance configured with tools and instructions specific to image generation.Behavior:
Creates a
functiontoolwrapping thegenerateImagefunction with a descriptive name and purpose.Builds an
llmagentconfigured with:Name and description indicating image generation capabilities.
An instruction prompt guiding the agent to generate or edit images based on user prompts.
Tools including the
generate_imagefunction tool and aloadartifactstoolfor loading artifacts.
Logs fatal errors if tool or agent creation fails.
Usage: Used to initialize the image generation agent in an application or server.
Example Usage:
agent := GetImageGeneratorAgent(context.Background(), myLLMModel)
// agent can now be run with user inputs to generate images.
Implementation Details and Algorithms
Generative Model Usage: The function uses Google Cloud's Vertex AI generative models, specifically
"imagen-3.0-generate-002", to generate images from textual prompts. This model is accessed through thegenaiclient.Artifact Storage: Generated image bytes are saved as an artifact using the context's artifact service. The image is wrapped as a
genai.Partwith MIME type"image/png".Error Handling Strategy: Instead of returning Go errors for operational failures (like client setup or image generation failure), the function returns a structured result with a status field. This allows tools and agents to handle failures gracefully without terminating the execution flow.
Tool Wrapping: The
generateImagefunction is wrapped into afunctiontoolto integrate seamlessly with the agent tooling system, enabling automatic JSON schema inference and invocation.Agent Composition: The agent includes the image generation tool and an artifact loading tool (
loadartifactstool.New()) to support workflows involving both creation and retrieval of images, demonstrating modular integration of tools.
Interaction with Other System Components
genai Client (
google.golang.org/genai): The generative AI client abstracts calls to Vertex AI generative models.Agent Framework:
The agent is created via
llmagent.New()which builds an LLM-driven agent with tools and instructions.The function tool system (
functiontool.New) wraps Go functions as callable tools within agents.
Artifact Management: The agent relies on the artifact service accessible via the
tool.Contextinterface (ctx.Artifacts()) to save and load image files.Environment Variables: Uses
GOOGLE_CLOUD_PROJECTandGOOGLE_CLOUD_LOCATIONenvironment variables to configure the generative AI client.Tools:
generate_image: Custom tool wrappinggenerateImage.loadartifactstool: Provides artifact loading capabilities to the agent.
This file thus acts as a bridge between the generative AI service, artifact storage, and the agent invocation framework, enabling a coherent image generation capability within the larger AI system.
Diagram: Structure of image_generator.go
flowchart TD
A[GetImageGeneratorAgent] --> B[Create generate_image Tool]
A --> C[Create llmagent with Tools]
B --> D[generateImage Function]
D --> E[genai Client]
D --> F[GenerateImages API Call]
D --> G[Save Image Artifact]
C --> H[Tools: generate_image, loadartifactstool]
Description:
GetImageGeneratorAgentis the factory function that creates:A
generate_imagetool wrapping thegenerateImagefunction.An
llmagentconfigured with the model, instructions, and tools.
The
generateImagefunction performs:Client initialization to the generative AI backend.
Image generation via the model API.
Saving the generated image as an artifact.
The agent utilizes tools for image generation and artifact loading.
References to Related Topics
LLM Integration and Agentsfor understanding the underlying AI agent architecture and large language model integration.Tooling Systemfor details on creating and managing function tools and artifact loading tools.Artifact Managementfor artifact storage and retrieval used for saving generated images.Agent Invocation Contextfor context interfaces used during tool execution and artifact management.