tavily.py
Overview
The tavily.py module provides integration with Tavily, a specialized search and extraction engine optimized for large language models (LLMs). It defines components for performing:
Search queries via the TavilySearch tool, facilitating focused, efficient retrieval of relevant web content.
Web page content extraction via the TavilyExtract tool, enabling structured data extraction from URLs.
These tools encapsulate parameter validation, API interaction, error handling with retry logic, and result formatting, serving as reusable components within the InfiniFlow agent system.
Classes and Functions
1. TavilySearchParam
Description
Encapsulates configuration parameters for the Tavily search functionality. It inherits from ToolParamBase and defines metadata for the component, including parameter types, defaults, and validation rules.
Properties
Property | Type | Description | Default |
|---|---|---|---|
|
| Metadata describing the tool and its parameters. | See below |
|
| API key for authenticating requests to the Tavily API. |
|
|
| Level of search detail: |
|
|
| Maximum number of search results to return (1-20). | |
|
| Number of days to look back for search results (must be >1). |
|
|
| Whether to include direct answers (if available) in results. |
|
|
| Whether to include raw content of results. |
|
|
| Whether to include images in search results. |
|
|
| Whether to include descriptions for images in results. |
|
Metadata meta (Partial)
{
"name": "tavily_search",
"description": "Tavily is a search engine optimized for LLMs ...",
"parameters": {
"query": {
"type": "string",
"description": "Search keywords...",
"default": "{sys.query}",
"required": True
},
"topic": {
"type": "string",
"description": "Category of search ('general' or 'news')",
"enum": ["general", "news"],
"default": "general",
"required": False
},
"include_domains": {
"type": "array",
"description": "Domains to include in results",
"default": [],
"items": {"type": "string"}
},
"exclude_domains": {
"type": "array",
"description": "Domains to exclude",
"default": [],
"items": {"type": "string"}
}
}
}
Methods
check() -> NoneValidates parameter values for correctness. Raises errors if invalid values are detected.
get_input_form() -> dict[str, dict]Returns a minimal input form dictionary describing expected user inputs for UI generation or validation.
Usage Example
params = TavilySearchParam()
params.api_key = "my_api_key"
params.search_depth = "advanced"
params.max_results = 10
params.query = "machine learning"
params.check() # Raises error if invalid
2. TavilySearch
Description
A tool class that implements the search functionality against the Tavily API. It inherits from ToolBase and ABC (Abstract Base Class), providing a protected _invoke method that executes the search workflow asynchronously with timeout and retry handling.
Properties
component_name = "TavilySearch"— Identifies the component in the system.self.tavily_client— Instance ofTavilyClientused to communicate with the Tavily API.
Methods
_invoke(**kwargs) -> strExecutes the search query using parameters passed via
kwargsor defaults fromself._param.Parameters (kwargs):
query(str): Search keywords (required).topic(str): Search category, e.g.,"general"or"news".search_depth(str):"basic"or"advanced".max_results(int): Maximum number of results.days(int): Days range for search.include_answer,include_raw_content,include_images,include_image_descriptions(bool): Flags for result content.include_domains,exclude_domains(list[str]): Domains to include/exclude.
Return Value:
A formatted string of the search results or an error message.
Behavior:
Validates presence of
query.Initializes
TavilyClient.Attempts to fetch search results with retry on exceptions.
Processes results into chunks with titles, URLs, content, and scores.
Sets output JSON and formalized content for downstream consumption.
thoughts() -> strReturns a short string describing the current search intent for logging or debugging.
Implementation Details
Uses a decorator
@timeoutto limit execution time (default 12 seconds).Retries on exceptions, with a delay between retries.
Uses
_retrieve_chunksmethod (likely inherited fromToolBase) to process and store results.Logs exceptions using the standard
loggingmodule.
Usage Example
search_tool = TavilySearch()
search_tool._param = TavilySearchParam()
search_tool._param.api_key = "my_api_key"
result = search_tool._invoke(query="latest AI breakthroughs")
print(result)
3. TavilyExtractParam
Description
Defines parameters for the Tavily content extraction component. Inherits from ToolParamBase, specifying metadata and validation for extraction options.
Properties
Property | Type | Description | Default |
|---|---|---|---|
| ToolMeta | Metadata specifying the name, description, and parameters of the extraction tool. | See below |
| str | API key for Tavily. |
|
| str | Extraction detail level: |
|
| list | List of URLs to extract content from. |
|
| str | Format of extracted content: |
|
| bool | Whether to include images in extracted content. |
|
Metadata meta (Partial)
{
"name": "tavily_extract",
"description": "Extract web page content from one or more specified URLs...",
"parameters": {
"urls": {
"type": "array",
"description": "URLs to extract content from",
"items": {"type": "string"},
"required": True
},
"extract_depth": {
"type": "string",
"description": "Depth of extraction: basic or advanced",
"enum": ["basic", "advanced"],
"default": "basic"
},
"format": {
"type": "string",
"description": "Output format of extracted content",
"enum": ["markdown", "text"],
"default": "markdown"
}
}
}
Methods
check() -> NoneValidates that
extract_depthandformatare within allowed values.get_input_form() -> dict[str, dict]Returns a simple dictionary describing expected user input fields.
Usage Example
extract_params = TavilyExtractParam()
extract_params.api_key = "my_api_key"
extract_params.urls = ["https://example.com"]
extract_params.extract_depth = "advanced"
extract_params.check()
4. TavilyExtract
Description
A tool class that performs content extraction from URLs via the Tavily API. Inherits from ToolBase and ABC. It defines an _invoke method to execute the extraction process with retries and timeout.
Properties
component_name = "TavilyExtract"self.tavily_client— Instance ofTavilyClientused for API communication.
Methods
_invoke(**kwargs) -> strPerforms extraction from specified URLs.
Parameters (kwargs):
urls(list[str] or comma-separated string): URLs to extract.extract_depth(str):"basic"or"advanced".format(str):"markdown"or"text".include_images(bool): Whether to include images.
Return Value:
JSON string of extracted results or error message.
Behavior:
Normalizes
urlsif provided as a comma-separated string.Retries extraction on exceptions with logging.
Sets output JSON for downstream use.
thoughts() -> strReturns a brief log string mentioning the URLs being processed.
Implementation Details
Uses
@timeoutdecorator with a longer timeout (default 10 minutes).Includes robust error handling and retry logic.
Uses the Tavily client’s
extractmethod.
Usage Example
extract_tool = TavilyExtract()
extract_tool._param = TavilyExtractParam()
extract_tool._param.api_key = "my_api_key"
extract_tool._param.urls = ["https://example.com/article"]
result = extract_tool._invoke()
print(result)
Important Implementation Details and Algorithms
Timeout Decorator: Both
_invokemethods use a@timeoutdecorator imported fromapi.utils.api_utilsto enforce execution time limits, preventing long-running operations from blocking the system.Retry Logic: On API failures or exceptions, both tools retry a configurable number of times (
self._param.max_retries+1) with delay handling (self._param.delay_after_error) to improve resilience against transient errors.Parameter Validation: Each parameter class (
TavilySearchParam,TavilyExtractParam) enforces strict validation on critical parameters such as allowed enum values and positive integers, reducing runtime errors.Result Processing:
TavilySearchprocesses raw search results into “chunks” with titles, URLs, content, and relevance scores, likely for downstream LLM consumption or display.Flexible Input Handling: The extract tool normalizes input URLs when passed as a comma-delimited string to maintain flexibility for different calling contexts.
Logging: Uses Python’s standard
loggingmodule to log exceptions for operational transparency and debugging.
Interaction with Other System Components
TavilyClient: Core client communicating with the Tavily API. This module imports
TavilyClientfromtavily(likely a separate internal or external package).Base Classes:
ToolParamBase,ToolBase, andToolMetaare imported fromagent.tools.baseand provide foundational behavior for parameter management, tool execution lifecycle, and metadata handling.Timeout Utility: The
timeoutdecorator enforces execution limits, imported fromapi.utils.api_utils.Output Handling: Methods such as
set_outputandoutputare inherited fromToolBaseand handle communication of results/errors to downstream components or user interfaces.Environment Variables: Timeout values are configurable via environment variables (
COMPONENT_EXEC_TIMEOUT), enabling system-wide tuning.
Together, these classes serve as modular components within the InfiniFlow agent framework, enabling seamless search and extraction workflows powered by Tavily.
Visual Diagram
classDiagram
class TavilySearchParam {
+meta: ToolMeta
+api_key: str
+search_depth: str
+max_results: int
+days: int
+include_answer: bool
+include_raw_content: bool
+include_images: bool
+include_image_descriptions: bool
+check()
+get_input_form() dict
}
class TavilySearch {
+component_name: str = "TavilySearch"
-tavily_client: TavilyClient
+_invoke(**kwargs) str
+thoughts() str
}
class TavilyExtractParam {
+meta: ToolMeta
+api_key: str
+extract_depth: str
+urls: list
+format: str
+include_images: bool
+check()
+get_input_form() dict
}
class TavilyExtract {
+component_name: str = "TavilyExtract"
-tavily_client: TavilyClient
+_invoke(**kwargs) str
+thoughts() str
}
TavilySearchParam <|-- TavilySearch
TavilyExtractParam <|-- TavilyExtract
Summary
The tavily.py file defines two main tool components for the InfiniFlow agent:
TavilySearchParam / TavilySearch: Implements a search tool leveraging Tavily’s search API, with configurable parameters and robust retry and timeout handling.
TavilyExtractParam / TavilyExtract: Implements a content extraction tool that pulls structured data from URLs, supporting multiple extraction depths and output formats.
Both components integrate tightly with the InfiniFlow tool framework and the Tavily client, providing reliable, parameterized access to powerful web search and extraction capabilities optimized for large language models.
If you would like further details on integration or usage examples within the broader system, please let me know!