TestModel_GenerateStream_ok.httprr


Overview

This file captures an HTTP request and response trace demonstrating a streaming content generation interaction with the Google Gemini 2.0 language model API. It shows a single POST request made to the Gemini model's streamGenerateContent endpoint and the corresponding server-sent events (SSE) stream response, illustrating how the model generates text content incrementally.

The file serves as a reference example of how to perform a streaming generation request to the Gemini language model API and handle the streamed JSON events returned by the server. It highlights the request format, headers, payload structure, and the streaming response format including partial content chunks and final completion signals.


HTTP Request Details

Request Line and Endpoint

The request targets the Gemini model version gemini-2.0-flash with streaming enabled via Server-Sent Events (SSE), indicated by the query parameter alt=sse.

Request Headers

The Content-Type is JSON, and the user agent is the Go HTTP client library.

Request Body (JSON Payload)

{
  "contents": [
    {
      "parts": [
        {
          "text": "What is the capital of France? One word."
        }
      ],
      "role": "user"
    }
  ],
  "generationConfig": {
    "temperature": 0
  }
}

HTTP Response Details

Status and Headers

The response uses HTTP/2 and is a streaming response (text/event-stream) delivering incremental data chunks via SSE.

Response Payload (Streamed Data Events)

The response body consists of multiple data: lines, each containing a JSON object representing partial or final generation events.

First Data Event

{
  "candidates": [
    {
      "content": {
        "parts": [
          {
            "text": "Paris"
          }
        ],
        "role": "model"
      }
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 11,
    "totalTokenCount": 11,
    "promptTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 11
      }
    ]
  },
  "modelVersion": "gemini-2.0-flash",
  "responseId": "wzCjaPa4As7shMIP2Mei0AI"
}

Second Data Event (Final)

{
  "candidates": [
    {
      "content": {
        "parts": [
          {
            "text": "\n"
          }
        ],
        "role": "model"
      },
      "finishReason": "STOP"
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 10,
    "candidatesTokenCount": 2,
    "totalTokenCount": 12,
    "promptTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 10
      }
    ],
    "candidatesTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 2
      }
    ]
  },
  "modelVersion": "gemini-2.0-flash",
  "responseId": "wzCjaPa4As7shMIP2Mei0AI"
}

Usage and Interaction

This file exemplifies how an application or agent would interact with the Gemini streaming API:

  1. Send a POST request with the user prompt and generation configuration.

  2. Receive a streaming response with incremental data: events.

  3. Parse each event JSON to extract partial generated text (candidates.content.parts.text).

  4. Detect the final event by checking the finishReason or end of stream.

  5. Aggregate the streamed parts to reconstruct the complete generated response.

This interaction aligns with the broader LLM Integration and Agents topic, where agents or tools consume large language models in streaming mode for responsiveness and efficiency.


Implementation Details


Interaction with Other System Components


Visual Representation of File Structure and Workflow

flowchart TD
A[Post Request to Gemini Stream API]
A --> B[Request JSON Payload]
B --> C{Contains}
C -->|User Prompt| D[contents array with user message]
C -->|Generation Config| E[temperature=0]
A --> F[Headers: Content-Type, User-Agent etc.]
F --> G[Send HTTP Request]
G --> H[Server Response: HTTP/2 200 OK]
H --> I[Content-Type: text/event-stream]
I --> J[Data Event 1: Partial Content]
J --> K["Paris" text chunk]
I --> L[Data Event 2: Final Content]
L --> M["\n" text chunk]
L --> N[finishReason: STOP]
J & L --> O[Token Usage Metadata]
J & L --> P[ResponseId and Model Version]
O & P --> Q[Client Aggregates Streamed Output]
Q --> R[Complete Model Response: "Paris"]

Summary of Key Elements

Element

Description

POST URL

Gemini 2.0 flash model streaming content generation endpoint

Request Payload

User prompt message and generation config (temperature)

Response Type

Server-Sent Events (text/event-stream)

Response Content

Incremental JSON data events with text chunks and metadata

Streaming Mechanism

Partial and final generation content streamed as SSE events

Token Usage Tracking

Included in each event for observability

Model Version

gemini-2.0-flash

Completion Indicator

finishReason: STOP signals end of stream


This file is essential for understanding how streaming generation requests to the Gemini language model are formatted, sent, and processed, providing a concrete example for implementing streaming LLM interactions. It ties into broader system topics such as LLM Integration and Agents, REST API and Web Launchers, and Telemetry and Observability.