benchmark.py

Overview

benchmark.py is a utility module within the InfiniFlow project designed to perform benchmarking and evaluation of retrieval-based natural language processing models against standard datasets. It supports indexing and evaluating models on popular retrieval benchmarks such as MS MARCO v1.1, Trivia QA, and MIRACL, leveraging vector embeddings and similarity-based search.

The core functionality involves:

This file interacts with several components of the InfiniFlow system, including:


Classes and Methods

Class: Benchmark

Main class orchestrating the benchmarking process over different datasets.

__init__(self, kb_id)

Initializes a benchmark instance for a specific knowledgebase.

_get_retrieval(self, qrels) -> dict

Performs retrieval queries on the indexed documents and prepares a run dictionary for evaluation.

embedding(self, docs) -> (list, int)

Generates vector embeddings for a batch of documents and attaches them to the documents.

init_index(self, vector_size: int)

Initializes a new vector index with the specified vector size, deleting any existing index with the same name.

ms_marco_index(self, file_path: str, index_name: str) -> (dict, dict)

Processes MS MARCO v1.1 dataset files to index documents and build qrels.

trivia_qa_index(self, file_path: str, index_name: str) -> (dict, dict)

Indexes Trivia QA dataset similarly to MS MARCO but adapted to its schema.

miracl_index(self, file_path: str, corpus_path: str, index_name: str) -> (dict, dict)

Indexes MIRACL multilingual dataset.

save_results(self, qrels: dict, run: dict, texts: dict, dataset: str, file_path: str) -> None

Saves evaluation results to markdown and JSON files for detailed analysis.

__call__(self, dataset: str, file_path: str, miracl_corpus: str = '') -> None

Entry point method to run benchmarking for a specified dataset.


Important Implementation Details


System Interaction


Usage Example

python benchmark.py 1000 my_kb_id ms_marco_v1.1 /path/to/ms_marco_dataset

This command benchmarks the MS MARCO v1.1 dataset, indexing up to 1000 documents, using the knowledgebase with ID my_kb_id. Results will be saved in the dataset directory.

For MIRACL, an additional corpus path is required:

python benchmark.py 500 my_kb_id miracl /path/to/miracl_dataset /path/to/miracl_corpus

Mermaid Class Diagram

classDiagram
    class Benchmark {
        -kb_id: str
        -kb: Knowledgebase
        -similarity_threshold: float
        -vector_similarity_weight: float
        -embd_mdl: LLMBundle
        -tenant_id: str
        -index_name: str
        -initialized_index: bool
        +__init__(kb_id)
        -_get_retrieval(qrels) dict
        +embedding(docs) tuple
        +init_index(vector_size)
        +ms_marco_index(file_path, index_name) tuple
        +trivia_qa_index(file_path, index_name) tuple
        +miracl_index(file_path, corpus_path, index_name) tuple
        +save_results(qrels, run, texts, dataset, file_path)
        +__call__(dataset, file_path, miracl_corpus='')
    }

Summary

benchmark.py is a specialized benchmarking tool within InfiniFlow that supports indexing and evaluating retrieval models over several major NLP datasets using vector embeddings. Its modular design accommodates dataset-specific preprocessing and indexing while providing consistent evaluation reporting. It integrates tightly with InfiniFlow’s knowledgebase, embedding models, document store, and retrieval services, making it a critical utility for assessing and validating retrieval model performance.