TestModel_GenerateStream_ok.httprr

Overview

This file captures an HTTP request and response trace demonstrating a streaming content generation interaction with the Google Gemini 2.0 language model API. It shows a single POST request made to the Gemini model's streamGenerateContent endpoint and the corresponding server-sent events (SSE) stream response, illustrating how the model generates text content incrementally.

The file serves as a reference example of how to perform a streaming generation request to the Gemini language model API and handle the streamed JSON events returned by the server. It highlights the request format, headers, payload structure, and the streaming response format including partial content chunks and final completion signals.

HTTP Request Details

Request Line and Endpoint

Method: POST
URL: https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:streamGenerateContent?alt=sse
HTTP Version: HTTP/1.1

The request targets the Gemini model version gemini-2.0-flash with streaming enabled via Server-Sent Events (SSE), indicated by the query parameter alt=sse.

Request Headers

Host: generativelanguage.googleapis.com
User-Agent: Go-http-client/1.1
Content-Length: 129
Content-Type: application/json

The Content-Type is JSON, and the user agent is the Go HTTP client library.

Request Body (JSON Payload)

{
  "contents": [
    {
      "parts": [
        {
          "text": "What is the capital of France? One word."
        }
      ],
      "role": "user"
    }
  ],
  "generationConfig": {
    "temperature": 0
  }
}

contents: An array of messages forming the prompt. Here, a single user message is provided with the text query.
generationConfig: Configuration for generation parameters; temperature is set to 0 for deterministic output.

HTTP Response Details

Status and Headers

Status: HTTP/2.0 200 OK
Headers:
- Connection: close
- Content-Disposition: attachment
- Content-Type: text/event-stream
- Date: Mon, 18 Aug 2025 13:55:15 GMT
- Additional security headers (X-Content-Type-Options, X-Frame-Options, etc.)

The response uses HTTP/2 and is a streaming response (text/event-stream) delivering incremental data chunks via SSE.

Response Payload (Streamed Data Events)

The response body consists of multiple data: lines, each containing a JSON object representing partial or final generation events.

First Data Event

{
  "candidates": [
    {
      "content": {
        "parts": [
          {
            "text": "Paris"
          }
        ],
        "role": "model"
      }
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 11,
    "totalTokenCount": 11,
    "promptTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 11
      }
    ]
  },
  "modelVersion": "gemini-2.0-flash",
  "responseId": "wzCjaPa4As7shMIP2Mei0AI"
}

candidates: Array of generated content candidates; here a single candidate with text "Paris".
usageMetadata: Token usage stats for prompt and total tokens.
modelVersion: Indicates the model version.
responseId: Unique ID for this generation.

Second Data Event (Final)

{
  "candidates": [
    {
      "content": {
        "parts": [
          {
            "text": "\n"
          }
        ],
        "role": "model"
      },
      "finishReason": "STOP"
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 10,
    "candidatesTokenCount": 2,
    "totalTokenCount": 12,
    "promptTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 10
      }
    ],
    "candidatesTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 2
      }
    ]
  },
  "modelVersion": "gemini-2.0-flash",
  "responseId": "wzCjaPa4As7shMIP2Mei0AI"
}

Contains a newline character indicating end of content.
finishReason is "STOP" indicating generation completion.
Detailed token counts for prompt and candidates.

Usage and Interaction

This file exemplifies how an application or agent would interact with the Gemini streaming API:

Send a POST request with the user prompt and generation configuration.
Receive a streaming response with incremental data: events.
Parse each event JSON to extract partial generated text (candidates.content.parts.text).
Detect the final event by checking the finishReason or end of stream.
Aggregate the streamed parts to reconstruct the complete generated response.

This interaction aligns with the broader LLM Integration and Agents topic, where agents or tools consume large language models in streaming mode for responsiveness and efficiency.

Implementation Details

The request JSON uses a nested structure with contents containing message parts and roles, consistent with chat-based LLM APIs.
The generation is configured for deterministic output (temperature 0).
The response uses SSE (text/event-stream) to push incremental partial completions, allowing clients to process tokens as they arrive.
Token usage metadata is included in each event, useful for tracking prompt and response costs.
The responseId allows correlation of events belonging to the same generation session.
The final event includes a finishReason field signaling the end of generation.

Interaction with Other System Components

This HTTP request/response trace could be generated and consumed by components responsible for LLM interaction within the system, such as those described in LLM Integration and Agents.
Agents or workflows managing LLM calls may use this streaming approach to reduce latency and improve user experience, related to Agent Workflow Management and Agent Execution Runner.
Token usage metadata supports observability and telemetry, tying into Telemetry and Observability.
The file format and streaming semantics align with REST API patterns discussed in REST API and Web Launchers.

Visual Representation of File Structure and Workflow

flowchart TD
A[Post Request to Gemini Stream API]
A --> B[Request JSON Payload]
B --> C{Contains}
C -->|User Prompt| D[contents array with user message]
C -->|Generation Config| E[temperature=0]
A --> F[Headers: Content-Type, User-Agent etc.]
F --> G[Send HTTP Request]
G --> H[Server Response: HTTP/2 200 OK]
H --> I[Content-Type: text/event-stream]
I --> J[Data Event 1: Partial Content]
J --> K["Paris" text chunk]
I --> L[Data Event 2: Final Content]
L --> M["\n" text chunk]
L --> N[finishReason: STOP]
J & L --> O[Token Usage Metadata]
J & L --> P[ResponseId and Model Version]
O & P --> Q[Client Aggregates Streamed Output]
Q --> R[Complete Model Response: "Paris"]

Summary of Key Elements

Element	Description
POST URL	Gemini 2.0 flash model streaming content generation endpoint
Request Payload	User prompt message and generation config (temperature)
Response Type	Server-Sent Events (`text/event-stream`)
Response Content	Incremental JSON data events with text chunks and metadata
Streaming Mechanism	Partial and final generation content streamed as SSE events
Token Usage Tracking	Included in each event for observability
Model Version	`gemini-2.0-flash`
Completion Indicator	`finishReason: STOP` signals end of stream

This file is essential for understanding how streaming generation requests to the Gemini language model are formatted, sent, and processed, providing a concrete example for implementing streaming LLM interactions. It ties into broader system topics such as LLM Integration and Agents, REST API and Web Launchers, and Telemetry and Observability.