storage_factory.py


Overview

The storage_factory.py file implements a factory pattern to instantiate storage client objects corresponding to different cloud or object storage services. Its primary purpose is to provide a unified, extensible interface to create instances of various storage backends used within the InfiniFlow system, such as MinIO, Azure Blob Storage (via Service Principal or SAS tokens), AWS S3, Alibaba Cloud OSS, and OpenDAL.

By leveraging an enumeration to represent supported storage types and a centralized factory class to manage instantiation, this module decouples storage client creation from the rest of the application. This design simplifies switching storage implementations via environment configuration without affecting other system components.


Detailed Explanation

Enum: Storage

An enumeration listing the supported storage backend types. Each member has an integer value for internal mapping.

Member

Value

Description

MINIO

1

MinIO object storage service

AZURE_SPN

2

Azure Blob Storage with Service Principal authentication

AZURE_SAS

3

Azure Blob Storage with SAS token authentication

AWS_S3

4

Amazon Web Services S3 storage

OSS

5

Alibaba Cloud Object Storage Service (OSS)

OPENDAL

6

OpenDAL abstraction layer for various storage backends

Usage example:

from storage_factory import Storage

storage_type = Storage.MINIO
print(storage_type.name)  # Output: MINIO
print(storage_type.value) # Output: 1

Class: StorageFactory

A factory class responsible for creating instances of storage client classes based on the Storage enum type.

Attributes

Storage Enum

Client Class

Import Path

MINIO

RAGFlowMinio

rag.utils.minio_conn

AZURE_SPN

RAGFlowAzureSpnBlob

rag.utils.azure_spn_conn

AZURE_SAS

RAGFlowAzureSasBlob

rag.utils.azure_sas_conn

AWS_S3

RAGFlowS3

rag.utils.s3_conn

OSS

RAGFlowOSS

rag.utils.oss_conn

OPENDAL

OpenDALStorage

rag.utils.opendal_conn

Methods


Module-Level Variables

Example usage:

import os
from storage_factory import STORAGE_IMPL, STORAGE_IMPL_TYPE

print(f"Using storage implementation: {STORAGE_IMPL_TYPE}")
# STORAGE_IMPL can be used directly to interact with the configured storage backend

Implementation Details and Algorithms


Interaction with Other System Components


Example: Using storage_factory.py in an Application

from storage_factory import STORAGE_IMPL

# Example: Upload a file to the configured storage backend
file_path = 'data/example.txt'
destination_path = 'uploads/example.txt'

with open(file_path, 'rb') as f:
    file_data = f.read()

# Assuming STORAGE_IMPL has an 'upload' method
STORAGE_IMPL.upload(destination_path, file_data)

Mermaid Class Diagram

classDiagram
    class Storage {
        <<enumeration>>
        +MINIO = 1
        +AZURE_SPN = 2
        +AZURE_SAS = 3
        +AWS_S3 = 4
        +OSS = 5
        +OPENDAL = 6
    }

    class StorageFactory {
        +storage_mapping: dict
        +create(storage: Storage) object
    }

    StorageFactory --> Storage

    %% Storage client classes (simplified representation)
    class RAGFlowMinio {
        +upload(...)
        +download(...)
    }
    class RAGFlowAzureSpnBlob {
        +upload(...)
        +download(...)
    }
    class RAGFlowAzureSasBlob {
        +upload(...)
        +download(...)
    }
    class RAGFlowS3 {
        +upload(...)
        +download(...)
    }
    class RAGFlowOSS {
        +upload(...)
        +download(...)
    }
    class OpenDALStorage {
        +upload(...)
        +download(...)
    }

    StorageFactory ..> RAGFlowMinio : maps to
    StorageFactory ..> RAGFlowAzureSpnBlob : maps to
    StorageFactory ..> RAGFlowAzureSasBlob : maps to
    StorageFactory ..> RAGFlowS3 : maps to
    StorageFactory ..> RAGFlowOSS : maps to
    StorageFactory ..> OpenDALStorage : maps to

Summary

The storage_factory.py module is a crucial abstraction layer within InfiniFlow for managing multiple storage backends. It provides:

This design promotes modularity, extensibility, and ease of deployment configuration for storage operations across different cloud providers and storage frameworks.