TestModel_GenerateStream_ok.httprr
Overview
This file captures an HTTP request and response trace demonstrating a streaming content generation interaction with the Google Gemini 2.0 language model API. It shows a single POST request made to the Gemini model's streamGenerateContent endpoint and the corresponding server-sent events (SSE) stream response, illustrating how the model generates text content incrementally.
The file serves as a reference example of how to perform a streaming generation request to the Gemini language model API and handle the streamed JSON events returned by the server. It highlights the request format, headers, payload structure, and the streaming response format including partial content chunks and final completion signals.
HTTP Request Details
Request Line and Endpoint
Method: POST
HTTP Version: HTTP/1.1
The request targets the Gemini model version gemini-2.0-flash with streaming enabled via Server-Sent Events (SSE), indicated by the query parameter alt=sse.
Request Headers
Host: generativelanguage.googleapis.comUser-Agent: Go-http-client/1.1Content-Length: 129Content-Type: application/json
The Content-Type is JSON, and the user agent is the Go HTTP client library.
Request Body (JSON Payload)
{
"contents": [
{
"parts": [
{
"text": "What is the capital of France? One word."
}
],
"role": "user"
}
],
"generationConfig": {
"temperature": 0
}
}
contents: An array of messages forming the prompt. Here, a single user message is provided with the text query.
generationConfig: Configuration for generation parameters;
temperatureis set to 0 for deterministic output.
HTTP Response Details
Status and Headers
Status: HTTP/2.0 200 OK
Headers:
Connection: closeContent-Disposition: attachmentContent-Type: text/event-streamDate: Mon, 18 Aug 2025 13:55:15 GMTAdditional security headers (
X-Content-Type-Options,X-Frame-Options, etc.)
The response uses HTTP/2 and is a streaming response (text/event-stream) delivering incremental data chunks via SSE.
Response Payload (Streamed Data Events)
The response body consists of multiple data: lines, each containing a JSON object representing partial or final generation events.
First Data Event
{
"candidates": [
{
"content": {
"parts": [
{
"text": "Paris"
}
],
"role": "model"
}
}
],
"usageMetadata": {
"promptTokenCount": 11,
"totalTokenCount": 11,
"promptTokensDetails": [
{
"modality": "TEXT",
"tokenCount": 11
}
]
},
"modelVersion": "gemini-2.0-flash",
"responseId": "wzCjaPa4As7shMIP2Mei0AI"
}
candidates: Array of generated content candidates; here a single candidate with text "Paris".
usageMetadata: Token usage stats for prompt and total tokens.
modelVersion: Indicates the model version.
responseId: Unique ID for this generation.
Second Data Event (Final)
{
"candidates": [
{
"content": {
"parts": [
{
"text": "\n"
}
],
"role": "model"
},
"finishReason": "STOP"
}
],
"usageMetadata": {
"promptTokenCount": 10,
"candidatesTokenCount": 2,
"totalTokenCount": 12,
"promptTokensDetails": [
{
"modality": "TEXT",
"tokenCount": 10
}
],
"candidatesTokensDetails": [
{
"modality": "TEXT",
"tokenCount": 2
}
]
},
"modelVersion": "gemini-2.0-flash",
"responseId": "wzCjaPa4As7shMIP2Mei0AI"
}
Contains a newline character indicating end of content.
finishReasonis"STOP"indicating generation completion.Detailed token counts for prompt and candidates.
Usage and Interaction
This file exemplifies how an application or agent would interact with the Gemini streaming API:
Send a POST request with the user prompt and generation configuration.
Receive a streaming response with incremental
data:events.Parse each event JSON to extract partial generated text (
candidates.content.parts.text).Detect the final event by checking the
finishReasonor end of stream.Aggregate the streamed parts to reconstruct the complete generated response.
This interaction aligns with the broader LLM Integration and Agents topic, where agents or tools consume large language models in streaming mode for responsiveness and efficiency.
Implementation Details
The request JSON uses a nested structure with
contentscontaining message parts and roles, consistent with chat-based LLM APIs.The generation is configured for deterministic output (temperature 0).
The response uses SSE (
text/event-stream) to push incremental partial completions, allowing clients to process tokens as they arrive.Token usage metadata is included in each event, useful for tracking prompt and response costs.
The
responseIdallows correlation of events belonging to the same generation session.The final event includes a
finishReasonfield signaling the end of generation.
Interaction with Other System Components
This HTTP request/response trace could be generated and consumed by components responsible for LLM interaction within the system, such as those described in LLM Integration and Agents.
Agents or workflows managing LLM calls may use this streaming approach to reduce latency and improve user experience, related to Agent Workflow Management and Agent Execution Runner.
Token usage metadata supports observability and telemetry, tying into Telemetry and Observability.
The file format and streaming semantics align with REST API patterns discussed in REST API and Web Launchers.
Visual Representation of File Structure and Workflow
flowchart TD
A[Post Request to Gemini Stream API]
A --> B[Request JSON Payload]
B --> C{Contains}
C -->|User Prompt| D[contents array with user message]
C -->|Generation Config| E[temperature=0]
A --> F[Headers: Content-Type, User-Agent etc.]
F --> G[Send HTTP Request]
G --> H[Server Response: HTTP/2 200 OK]
H --> I[Content-Type: text/event-stream]
I --> J[Data Event 1: Partial Content]
J --> K["Paris" text chunk]
I --> L[Data Event 2: Final Content]
L --> M["\n" text chunk]
L --> N[finishReason: STOP]
J & L --> O[Token Usage Metadata]
J & L --> P[ResponseId and Model Version]
O & P --> Q[Client Aggregates Streamed Output]
Q --> R[Complete Model Response: "Paris"]
Summary of Key Elements
Element | Description |
|---|---|
POST URL | Gemini 2.0 flash model streaming content generation endpoint |
Request Payload | User prompt message and generation config (temperature) |
Response Type | Server-Sent Events ( |
Response Content | Incremental JSON data events with text chunks and metadata |
Streaming Mechanism | Partial and final generation content streamed as SSE events |
Token Usage Tracking | Included in each event for observability |
Model Version |
|
Completion Indicator |
|
This file is essential for understanding how streaming generation requests to the Gemini language model are formatted, sent, and processed, providing a concrete example for implementing streaming LLM interactions. It ties into broader system topics such as LLM Integration and Agents, REST API and Web Launchers, and Telemetry and Observability.