#audio.py

Overview

The audio.py file provides functionality to process and transcribe audio files into text tokens using a Speech-to-Text large language model (LLM). It primarily defines the chunk function, which takes an audio file (as binary data), verifies its format, temporarily saves it, and then uses an LLM-based transcription service to convert the audio into text. The resulting transcription is tokenized to facilitate downstream natural language processing (NLP) tasks such as semantic search or indexing.

This file acts as a bridge between raw audio input and text-based NLP components, integrating tightly with the LLM service layer for transcription and the RAG (Retrieval-Augmented Generation) NLP tokenization utilities.


Detailed Description

Function: chunk

chunk(filename, binary, tenant_id, lang, callback=None, **kwargs) -> list[dict]

Purpose

Processes an audio file by:

  1. Validating its file extension.

  2. Temporarily saving the audio binary content to disk.

  3. Invoking a Speech-to-Text LLM service to transcribe the audio.

  4. Tokenizing the transcription text for further analysis.

  5. Returning a list containing a dictionary with metadata and tokenized transcription.

Parameters

Returns

If an error occurs (unsupported extension, transcription failure, etc.), returns an empty list and reports the error via callback.

Usage Example

def progress_callback(progress, msg):
    print(f"Progress: {progress*100:.1f}%, Message: {msg}")

with open("sample_audio.wav", "rb") as f:
    audio_bytes = f.read()

results = chunk(
    filename="sample_audio.wav",
    binary=audio_bytes,
    tenant_id="tenant_123",
    lang="English",
    callback=progress_callback
)

if results:
    doc = results[0]
    print("Filename tokens:", doc["title_tks"])
    print("Fine tokens:", doc["title_sm_tks"])
    # Access transcription tokens added by tokenize()
else:
    print("Failed to process audio.")

Implementation Details


Interaction with Other Components


Mermaid Class Diagram

classDiagram
    class LLMBundle {
        +__init__(tenant_id: str, llm_type: LLMType, lang: str)
        +transcription(audio_path: str) str
    }

    class rag_tokenizer {
        +tokenize(text: str) list
        +fine_grained_tokenize(tokens: list) list
    }

    class audio {
        +chunk(filename: str, binary: bytes, tenant_id: str, lang: str, callback: callable=None, **kwargs) list[dict]
    }

    audio ..> LLMBundle : uses
    audio ..> rag_tokenizer : uses

Summary

audio.py is a utility module designed to transform raw audio files into tokenized text representations by leveraging a Speech-to-Text LLM service. It abstracts the complexity of file handling, format validation, transcription, and tokenization into a single function, chunk, which can be integrated into larger data processing or NLP pipelines within the InfiniFlow system. Its design supports extensibility through callbacks and parameterization, facilitating multi-tenant and multilingual environments.