init.py


Overview

This init.py file is part of the InfiniFlow project and serves as a utility module providing several helper functions and decorators commonly used throughout the codebase. Its primary role is to supply foundational utilities for token encoding/decoding, singleton pattern enforcement, whitespace cleanup in strings, and reading maximum values from files. It also sets up the environment for token encoding caching by configuring the tiktoken library.

The file contains no classes but includes multiple standalone functions and one decorator function. It interacts with other parts of the system primarily through the tiktoken library for token operations and through importing a utility function get_project_base_directory() from the project's API utilities.


Detailed Description of Functions and Decorators

1. singleton(cls, *args, **kw)

A decorator to enforce the singleton pattern on a class, ensuring only one instance of the class exists per process.


2. rmSpace(txt: str) -> str

Removes excessive spaces around certain characters in a string, improving formatting by eliminating unwanted whitespace.


3. findMaxDt(fnm: str) -> str

Reads a file line by line and returns the maximum date-time string found, assuming the file contains date-time entries in string format.


4. findMaxTm(fnm: str) -> int

Reads a file line by line and returns the maximum integer value found, ignoring 'nan' lines.


5. num_tokens_from_string(string: str) -> int

Calculates the number of tokens in a given string using the tiktoken tokenizer.


6. truncate(string: str, max_len: int) -> str

Truncates a string to a maximum number of tokens, preserving token boundaries.


7. clean_markdown_block(text: str) -> str

Cleans a fenced Markdown code block that uses the markdown language tag by removing the opening and closing backticks and stripping whitespace.


8. get_float(v) -> float

Safely converts a value to float, returning negative infinity if conversion fails or value is None.


Important Implementation Details


Interaction with Other Parts of the System


Mermaid Class Diagram

As this file contains no classes, but several functions and a decorator, a flowchart illustrating function relationships and usage is more appropriate.

flowchart TD
    A[Module Initialization]
    A --> B[Set TIKTOKEN_CACHE_DIR env var]
    A --> C[Initialize tiktoken encoder]

    subgraph Decorator
        D[singleton(cls, *args, **kw)]
    end

    subgraph Text Utilities
        E[rmSpace(txt)]
        F[clean_markdown_block(text)]
        G[get_float(v)]
    end

    subgraph File Utilities
        H[findMaxDt(fnm)]
        I[findMaxTm(fnm)]
    end

    subgraph Tokenizer Functions
        J[num_tokens_from_string(string)]
        K[truncate(string, max_len)]
    end

    B --> C
    C --> J
    C --> K

    D --> |Decorator| A
    E -->|Input Text| A
    F -->|Input Text| A
    G -->|Input Value| A
    H -->|Input File| A
    I -->|Input File| A
    J -->|Uses encoder| C
    K -->|Uses encoder| C

Summary

This init.py utility module provides foundational helpers for token processing, singleton pattern implementation, string formatting fixes, and file content analysis useful across the InfiniFlow codebase. It configures the tokenization environment and exposes simple, robust functions that abstract common operations, promoting code reuse and consistency.


End of Documentation