wikipedia.py
Overview
The wikipedia.py file provides an integration component for querying and retrieving information from the Wikipedia online encyclopedia. It is designed as a tool within the InfiniFlow system, enabling users or agents to search Wikipedia by specific keywords and obtain summarized content from the most relevant articles.
The component handles search queries, fetches multiple matching Wikipedia pages, extracts their titles, URLs, and summaries, and formats the results for downstream consumption. It also supports configuration parameters such as the number of results to retrieve (top_n) and language selection.
This module leverages the third-party wikipedia Python library for interacting with the Wikipedia API and includes robust error handling and retry mechanisms to ensure reliability within the larger system.
Classes and Methods
WikipediaParam
class WikipediaParam(ToolParamBase):
This class defines the configuration parameters for the Wikipedia tool. It extends the base class ToolParamBase and encapsulates metadata and validation logic.
Attributes
meta: ToolMeta
A dictionary containing metadata about the tool, including its name, description, and parameters specification (e.g., thequeryparameter which is required).top_n: int(default: 10)
Specifies the maximum number of Wikipedia search results to retrieve.language: str(default: "en")
The Wikipedia language edition to query (e.g.,"en"for English). The value is validated against a supported list of language codes.
Methods
init(self)
Initializes the parameters with default values and metadata.check(self)
Validates the parameterstop_nandlanguage.
Raises exceptions iftop_nis not a positive integer or iflanguageis not in the supported languages list.get_input_form(self) -> dict[str, dict]
Returns a form definition dict describing the input parameters expected by the tool for UI or API integration.
Example return:{ "query": { "name": "Query", "type": "line" } }
Wikipedia
class Wikipedia(ToolBase, ABC):
The main tool class that implements Wikipedia search and content retrieval functionality. It inherits from ToolBase and ABC (Abstract Base Class).
Class Attributes
component_name: str
Set to"Wikipedia", identifying this component in the system.
Methods
_invoke(self, **kwargs)
Core method called to execute the Wikipedia search.
Decorated with a timeout decorator to limit execution time (default 60 seconds or as set by environment variableCOMPONENT_EXEC_TIMEOUT).Parameters:
kwargsmust include"query"(string): the search keyword/title to lookup on Wikipedia.
Behavior:
If
"query"parameter is missing or empty, returns an empty string immediately.Attempts up to
max_retries + 1times to:Set the Wikipedia language.
Search Wikipedia for up to
top_npages matching the query.For each search result, attempt to fetch the full Wikipedia page.
Extract and store title, URL, and summary of each page using
_retrieve_chunks.
On success, returns the formatted content from
self.output("formalized_content").On failure, logs the error, applies a delay, and retries.
If all retries fail, sets an error output and returns an error message string.
thoughts(self) -> str
Returns a formatted string describing the current search keywords and the intention to find the most relevant articles. Useful for debugging or logging.
Important Implementation Details
Retry Mechanism: The
_invokemethod includes retry logic based on amax_retriesparameter (presumably defined in the base class or parameters), which helps handle transient network or API errors gracefully.Timeout Handling: The use of the
@timeoutdecorator ensures that the Wikipedia query does not hang indefinitely, enforcing responsiveness constraints.Language Validation: The
WikipediaParamclass strictly validates the language code against a predefined list of supported Wikipedia languages, ensuring API requests are valid.Chunk Retrieval: The
_retrieve_chunksmethod (inherited or defined elsewhere) is used to process and store parts of the Wikipedia pages: title, URL, and summary. This modular approach allows flexible handling of retrieved data.Error Logging and Delays: Errors during API calls are logged with stack traces, and a delay (
delay_after_error) is enforced between retries to avoid rapid repeated failures.Dependency: The module depends on the external
wikipediaPython package, which wraps Wikipedia API calls.
Interaction with Other System Components
Base Classes:
Inherits fromToolBaseand usesToolParamBasefor parameters, indicating it fits into a larger agent/tool framework within InfiniFlow.Timeout Utility:
Usestimeoutdecorator fromapi.utils.api_utilsto manage execution time limits.Agent Tool Metadata:
Provides metadata viaToolMetafor integration in the agent's tool registry or UI.Input/Output Handling:
Uses methods likeset_outputandoutput(fromToolBase) to communicate results or errors back to the invoking system.Environment Configuration:
Reads the component execution timeout from environment variables for flexible deployment configuration.
Usage Example
from wikipedia import Wikipedia, WikipediaParam
# Instantiate parameters and set custom values
params = WikipediaParam()
params.top_n = 5
params.language = 'en'
# Instantiate the Wikipedia tool with parameters
wiki_tool = Wikipedia()
wiki_tool._param = params
# Invoke the search with a query
result = wiki_tool._invoke(query="Natural Language Processing")
print(result) # Outputs concatenated summaries of top 5 Wikipedia articles on the topic
Mermaid Class Diagram
classDiagram
class WikipediaParam {
+meta: ToolMeta
+top_n: int
+language: str
+__init__()
+check()
+get_input_form() dict
}
class Wikipedia {
+component_name: str
+_invoke(**kwargs) str
+thoughts() str
}
WikipediaParam <|-- Wikipedia
Wikipedia ..|> ToolBase
WikipediaParam ..|> ToolParamBase
Wikipedia ..|> ABC
Summary
The wikipedia.py file defines a Wikipedia search tool component for the InfiniFlow system. It supports configurable parameters, robust error handling, and language selection for querying Wikipedia articles by keyword. The tool fetches multiple matching pages, extracts key information, and returns summarized content suitable for use by agents or downstream applications. It integrates tightly with the InfiniFlow agent framework and external Wikipedia API via the wikipedia Python package.