tushare.py
Overview
tushare.py implements a component for integrating with the TuShare API, a financial data service providing news and market information. This component is part of the InfiniFlow system, designed to fetch and filter financial news articles from various sources within a specified date range. It processes the fetched data and returns a filtered result based on a keyword search.
The file defines two main classes:
TuShareParam: Holds configuration parameters for the TuShare API requests.
TuShare: The main component class that performs the API request, filters the news content, and returns the results in a structured format.
This component is likely used within a larger data processing or AI pipeline to enrich input data with relevant financial news fetched dynamically from TuShare.
Classes and Methods
Class: TuShareParam
class TuShareParam(ComponentParamBase):
Description
Defines the parameters for configuring the TuShare component. It extends from ComponentParamBase, inheriting parameter validation features.
Properties
Property | Type | Default Value | Description |
|---|---|---|---|
| string |
| API authentication token for TuShare. |
| string |
| Source of the news data; must be one of the allowed sources. |
| string |
| Start datetime for news filtering, format |
| string | Current time (default) | End datetime for news filtering, defaults to current local time. |
| string |
| Keyword to filter news content; case-insensitive search. |
Methods
init(self)Initializes the parameters with default values.
check(self)Validates the
srcparameter against a list of allowed news sources:["sina", "wallstreetcn", "10jqka", "eastmoney", "yuncaijing", "fenghuang", "jinrongjie"]Raises an error if the source is invalid.
Usage Example
params = TuShareParam()
params.token = "your_real_token"
params.src = "sina"
params.start_date = "2024-05-01 00:00:00"
params.end_date = "2024-05-10 23:59:59"
params.keyword = "stock"
params.check() # Validates parameters
Class: TuShare
class TuShare(ComponentBase, ABC):
Description
The main component class that fetches financial news data from the TuShare API, filters the results based on a keyword, and outputs the filtered news in markdown table format.
It inherits from ComponentBase and ABC (Abstract Base Class), indicating it is designed to integrate with a component-based pipeline framework.
Class Attributes
component_name: Class-level identifier string"TuShare".
Methods
_run(self, history, **kwargs)Core method executed when the component runs.
Parameters:
Parameter
Type
Description
historyAny
Historical context or data passed from pipeline.
**kwargsdict
Additional keyword arguments (not used here).
Returns:
A pandas
DataFramecontaining filtered news content in markdown table format.Or an output string wrapped by
TuShare.be_output()signaling errors or empty results.
Functionality:
Retrieves input data via
self.get_input(); expects a dictionary with a"content"key.Joins the input content list into a single comma-separated string for processing.
Prepares a POST request payload to TuShare API:
api_name:"news"token: Fromself._param.tokenparams: Includessrc,start_date,end_datefrom parameters.
Sends a request to
http://api.tushare.pro.On success (
response['code'] == 0), converts returned data into a pandas DataFrame.Filters the DataFrame rows where the
"content"column contains the specified keyword (case-insensitive).Converts the filtered DataFrame to markdown table format.
Returns the filtered content wrapped in a DataFrame or an error message if API call or processing fails.
Usage Example
ts_component = TuShare()
ts_component._param.token = "your_token"
ts_component._param.src = "sina"
ts_component._param.start_date = "2024-05-01 00:00:00"
ts_component._param.end_date = "2024-05-10 23:59:59"
ts_component._param.keyword = "earnings"
# Assume input content is provided via the pipeline framework
output_df = ts_component._run(history=None)
print(output_df)
Implementation Details and Algorithms
API Integration: The component uses the TuShare API via an HTTP POST request sending JSON-encoded parameters. It expects a JSON response with a structure containing
code,msg, anddata.Data Processing: The response's
data['items'](list of news records) anddata['fields'](column names) are converted into a pandas DataFrame for easy manipulation.Filtering: The DataFrame is filtered by checking if the
contentfield contains the keyword (case-insensitive). This simple substring search usespandas.Series.str.contains().Output Formatting: Filtered data is converted to markdown format using
DataFrame.to_markdown(). This format is suitable for rich text display in markdown-aware environments.Error Handling: The method catches all exceptions during the request or data processing and returns a formatted error string without raising.
Interaction with Other System Components
Base Classes: Inherits from
ComponentBaseandComponentParamBaselikely defined in theagent.component.basemodule. These base classes provide foundational functionality such as parameter management, input/output handling, and integration with the InfiniFlow pipeline.Input Handling: Uses
self.get_input()to receive data from upstream components in the pipeline.Output Handling: Uses
TuShare.be_output()method (inherited or defined in base class) to standardize output message formatting.External Service: Interacts with the external TuShare API at
http://api.tushare.proto retrieve live financial news data.Data Dependencies: Uses
pandasfor data manipulation,requestsfor HTTP communication, andjsonfor encoding the request payload.
Visual Diagram
classDiagram
class TuShareParam {
+token: str
+src: str
+start_date: str
+end_date: str
+keyword: str
+__init__()
+check()
}
class TuShare {
+component_name: str = "TuShare"
+_run(history, **kwargs) pandas.DataFrame or output string
}
ComponentParamBase <|-- TuShareParam
ComponentBase <|-- TuShare
TuShare ..> TuShareParam : uses
TuShare --> requests : HTTP POST
TuShare --> pandas.DataFrame : processes data
Summary
This file provides a modular, parameter-driven component to fetch financial news from TuShare API, filter it by keyword, and return structured results. It is designed to fit into a larger pipeline system (InfiniFlow), making it reusable and configurable for different data sources, time ranges, and search keywords.
The design emphasizes error resilience, simplicity in API interaction, and leveraging pandas for data management. It abstracts away API specifics behind a component interface, facilitating integration into automated workflows requiring timely financial news data.