schema.py

Overview

The schema.py file defines a data model representing the output structure of a chunking process in the InfiniFlow system. It leverages the Pydantic library to enforce data validation and serialization rules for a class named ChunkerFromUpstream. This model encapsulates metadata about the chunking operation (such as creation and elapsed times), the raw input blob, the name identifier, and multiple possible output formats (JSON, Markdown, plain text, and HTML).

This schema is intended to standardize how chunking results are communicated upstream or downstream within the system, ensuring consistency in data interchange and simplifying integration with other components that consume or produce chunked data.


Classes

ChunkerFromUpstream

A Pydantic BaseModel class that models the result of a chunking operation, including metadata and multiple output representations.

Attributes

Attribute

Type

Description

Alias

Default Value

created_time

[float

None](/projects/311/73599)

Timestamp representing when the chunk was created (optional).

_created_time

elapsed_time

[float

None](/projects/311/73599)

Time elapsed during the chunking operation, in seconds (optional).

_elapsed_time

name

str

Identifier or label for the chunk.

N/A

Required

blob

bytes

Raw binary data input that was chunked.

N/A

Required

output_format

[Literal["json", "markdown", "text", "html"]

None](/projects/311/71491)

The format of the output chunk. Can be one of "json", "markdown", "text", "html", or None.

N/A

json_result

[list[dict[str, Any]]

None](/projects/311/73599)

Chunked result in JSON format as a list of dictionaries (optional).

json

markdown_result

[str

None](/projects/311/73599)

Chunked result in Markdown string format (optional).

markdown

text_result

[str

None](/projects/311/73599)

Chunked result as plain text (optional).

text

html_result

[list[str]

None](/projects/311/73599)

Chunked result in HTML format, represented as a list of strings, each likely an HTML snippet (optional).

html

Config

Usage Example

from schema import ChunkerFromUpstream

chunk = ChunkerFromUpstream(
    created_time=1685600000.0,
    elapsed_time=0.123,
    name="example_chunk",
    blob=b"raw data bytes",
    output_format="json",
    json_result=[{"id": 1, "content": "chunk content"}]
)

print(chunk.json())  # Serialize to JSON string

Notes


Implementation Details


Interaction with Other System Components


Mermaid Class Diagram

classDiagram
    class ChunkerFromUpstream {
        +float? created_time
        +float? elapsed_time
        +str name
        +bytes blob
        +Literal["json","markdown","text","html"]? output_format
        +list[dict[str, Any]]? json_result
        +str? markdown_result
        +str? text_result
        +list[str]? html_result
    }

This documentation provides a detailed understanding of the schema.py file's structure and role within the InfiniFlow project, aiding developers, integrators, and maintainers in effectively utilizing and extending the chunking data model.