storage_factory.py
Overview
The storage_factory.py file implements a factory pattern to instantiate storage client objects corresponding to different cloud or object storage services. Its primary purpose is to provide a unified, extensible interface to create instances of various storage backends used within the InfiniFlow system, such as MinIO, Azure Blob Storage (via Service Principal or SAS tokens), AWS S3, Alibaba Cloud OSS, and OpenDAL.
By leveraging an enumeration to represent supported storage types and a centralized factory class to manage instantiation, this module decouples storage client creation from the rest of the application. This design simplifies switching storage implementations via environment configuration without affecting other system components.
Detailed Explanation
Enum: Storage
An enumeration listing the supported storage backend types. Each member has an integer value for internal mapping.
Member | Value | Description |
|---|---|---|
| 1 | MinIO object storage service |
| 2 | Azure Blob Storage with Service Principal authentication |
| 3 | Azure Blob Storage with SAS token authentication |
| 4 | Amazon Web Services S3 storage |
| 5 | Alibaba Cloud Object Storage Service (OSS) |
| 6 | OpenDAL abstraction layer for various storage backends |
Usage example:
from storage_factory import Storage
storage_type = Storage.MINIO
print(storage_type.name) # Output: MINIO
print(storage_type.value) # Output: 1
Class: StorageFactory
A factory class responsible for creating instances of storage client classes based on the Storage enum type.
Attributes
storage_mapping(dict): A class-level dictionary mapping eachStorageenum member to its corresponding storage client class imported from various utility modules.
Storage Enum | Client Class | Import Path |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Methods
create(cls, storage: Storage) -> objectA class method that takes a
Storageenum member and returns an instance of the corresponding storage client.Parameters:
storage(Storage): The type of storage for which to create a client instance.
Returns:
An instance of the storage client class mapped to the specified
storage.
Raises:
KeyErrorif thestorageargument does not map to any known storage client.
Usage example:
from storage_factory import StorageFactory, Storage # Create a MinIO client instance minio_client = StorageFactory.create(Storage.MINIO) # Use minio_client as needed...
Module-Level Variables
STORAGE_IMPL_TYPE(str): Reads the environment variableSTORAGE_IMPLto determine which storage backend to use in runtime. Defaults to'MINIO'if the environment variable is not set.STORAGE_IMPL: An instantiated storage client object created by the factory using the storage type specified by the environment variable.
Example usage:
import os
from storage_factory import STORAGE_IMPL, STORAGE_IMPL_TYPE
print(f"Using storage implementation: {STORAGE_IMPL_TYPE}")
# STORAGE_IMPL can be used directly to interact with the configured storage backend
Implementation Details and Algorithms
Factory Pattern: The
StorageFactoryclass uses the factory design pattern to abstract the creation of storage client instances. This pattern enables adding new storage backends easily by extending theStorageenum and updating thestorage_mappingdictionary.Dynamic Configuration: The choice of storage backend is externally configurable using environment variables (
STORAGE_IMPL), allowing users to switch storage providers without changing code.The factory method
createuses a dictionary lookup keyed by enum values, providing O(1) access to the required class type.Each storage client class is imported from its respective utility module within the
rag.utilspackage, encapsulating provider-specific connection and interaction logic.
Interaction with Other System Components
Storage Client Utilities: This file depends on several storage client implementations located in the
rag.utilspackage submodules (minio_conn,azure_sas_conn,azure_spn_conn,s3_conn,oss_conn, andopendal_conn). These clients encapsulate the logic for connecting to and interacting with their respective storage backends.Application Configuration: The environment variable
STORAGE_IMPLallows the application or deployment environment to dictate which storage backend is used globally.Downstream Usage: Other parts of the InfiniFlow system import
STORAGE_IMPLfrom this module to access the configured storage client transparently, facilitating storage operations like uploading, downloading, and managing objects.
Example: Using storage_factory.py in an Application
from storage_factory import STORAGE_IMPL
# Example: Upload a file to the configured storage backend
file_path = 'data/example.txt'
destination_path = 'uploads/example.txt'
with open(file_path, 'rb') as f:
file_data = f.read()
# Assuming STORAGE_IMPL has an 'upload' method
STORAGE_IMPL.upload(destination_path, file_data)
Mermaid Class Diagram
classDiagram
class Storage {
<<enumeration>>
+MINIO = 1
+AZURE_SPN = 2
+AZURE_SAS = 3
+AWS_S3 = 4
+OSS = 5
+OPENDAL = 6
}
class StorageFactory {
+storage_mapping: dict
+create(storage: Storage) object
}
StorageFactory --> Storage
%% Storage client classes (simplified representation)
class RAGFlowMinio {
+upload(...)
+download(...)
}
class RAGFlowAzureSpnBlob {
+upload(...)
+download(...)
}
class RAGFlowAzureSasBlob {
+upload(...)
+download(...)
}
class RAGFlowS3 {
+upload(...)
+download(...)
}
class RAGFlowOSS {
+upload(...)
+download(...)
}
class OpenDALStorage {
+upload(...)
+download(...)
}
StorageFactory ..> RAGFlowMinio : maps to
StorageFactory ..> RAGFlowAzureSpnBlob : maps to
StorageFactory ..> RAGFlowAzureSasBlob : maps to
StorageFactory ..> RAGFlowS3 : maps to
StorageFactory ..> RAGFlowOSS : maps to
StorageFactory ..> OpenDALStorage : maps to
Summary
The storage_factory.py module is a crucial abstraction layer within InfiniFlow for managing multiple storage backends. It provides:
A clear enumeration of supported storage systems.
A flexible factory class to instantiate storage clients dynamically.
Integration with environment configuration for runtime storage selection.
Encapsulation of storage-specific connection details via separate utility modules.
This design promotes modularity, extensibility, and ease of deployment configuration for storage operations across different cloud providers and storage frameworks.