azure_spn_conn.py

Overview

azure_spn_conn.py provides a singleton class RAGFlowAzureSpnBlob that manages secure connections to an Azure Data Lake Storage Gen2 container using a Service Principal Name (SPN) for authentication. This class encapsulates functionality for connecting to Azure Blob Storage, performing basic file operations such as upload, download, deletion, existence checks, and generating presigned URLs for temporary access. It handles connection retries and re-authentication transparently, making it a robust interface for Azure Data Lake file interactions in the InfiniFlow RAG (Retrieval-Augmented Generation) system.


Classes and Methods

RAGFlowAzureSpnBlob

A singleton class that manages Azure Data Lake storage connection using Azure SDK's ClientSecretCredential for authentication via SPN and exposes methods to manipulate files in a specified container.

This class uses environment variables or fallback settings from the rag.settings module to configure connection parameters.

Initialization

def __init__(self)

Private Methods

__open__
def __open__(self)
__close__
def __close__(self)

Public Methods

health
def health(self) -> bool
azure_blob = RAGFlowAzureSpnBlob()
is_healthy = azure_blob.health()
print(f"Connection healthy: {is_healthy}")
put
def put(self, bucket: str, fnm: str, binary: bytes) -> bool
data = b"Hello, Azure!"
azure_blob.put("mybucket", "folder1/hello.txt", data)
rm
def rm(self, bucket: str, fnm: str) -> None
azure_blob.rm("mybucket", "folder1/oldfile.txt")
get
def get(self, bucket: str, fnm: str) -> bytes | None
content = azure_blob.get("mybucket", "folder1/data.json")
if content:
    print(content.decode("utf-8"))
obj_exist
def obj_exist(self, bucket: str, fnm: str) -> bool
exists = azure_blob.obj_exist("mybucket", "folder1/checkfile.txt")
print(f"File exists: {exists}")
get_presigned_url
def get_presigned_url(self, bucket: str, fnm: str, expires: int) -> str | None
url = azure_blob.get_presigned_url("mybucket", "folder1/report.pdf", expires=3600)
if url:
    print(f"Presigned URL: {url}")

Implementation Details and Algorithms


Interaction With Other System Components

This module acts as a backend utility for other parts of the InfiniFlow RAG system that require reading/writing data to Azure Blob Storage securely and efficiently.


Mermaid Class Diagram

classDiagram
    class RAGFlowAzureSpnBlob {
        -conn: FileSystemClient | None
        -account_url: str
        -client_id: str
        -secret: str
        -tenant_id: str
        -container_name: str
        +__init__()
        -__open__()
        -__close__()
        +health() bool
        +put(bucket: str, fnm: str, binary: bytes) bool|None
        +rm(bucket: str, fnm: str) None
        +get(bucket: str, fnm: str) bytes|None
        +obj_exist(bucket: str, fnm: str) bool
        +get_presigned_url(bucket: str, fnm: str, expires: int) str|None
    }

Summary

azure_spn_conn.py provides a robust, singleton Azure Data Lake Storage client using SPN authentication tailored for the InfiniFlow RAG ecosystem. It abstracts away connection management, retries, and authentication, offering simple methods to upload, download, delete, check existence, and generate presigned URLs for files within a configured container. The implementation ensures reliable Azure storage interactions within the broader application stack.