synonym.py
Overview
synonym.py provides functionality for managing and retrieving synonyms for given words or phrases. It primarily focuses on loading a synonym dictionary from a JSON resource and augmenting it with real-time updates fetched from a Redis cache if available. Additionally, it integrates the WordNet lexical database from the NLTK library for English word synonym lookups.
The main component, the Dealer class, encapsulates the synonym lookup logic and dictionary management, including loading synonyms from a static JSON file (synonym.json) and optionally refreshing data from a Redis store to support real-time synonym updates.
This file is typically used in applications or services that require synonym expansion or normalization, such as search engines, natural language processing pipelines, or query understanding modules within the InfiniFlow project.
Classes and Methods
Class: Dealer
The Dealer class is responsible for loading synonym mappings, managing updates, and providing synonym lookup functionality.
Initialization
Dealer(redis=None)
Parameters
redis(optional): A Redis client instance for real-time synonym dictionary updates. If not provided, the class operates with the static dictionary loaded from the local JSON file only.
Attributes
lookup_num(int): Tracks the number of synonym lookups performed since the last dictionary reload to control refresh frequency.load_tm(float): Timestamp of the last dictionary load from Redis.dictionary(dict): The synonym dictionary loaded from the JSON file or Redis.redis: The Redis connection instance.
Behavior
Attempts to load the synonym dictionary from the file path:
{project_base_directory}/rag/res/synonym.json.Logs warnings if the synonym file is missing or empty.
Warns if no Redis connection is given, disabling real-time synonym updates.
Calls
load()to potentially load dictionary updates from Redis.
Method: load()
Dealer.load()
Purpose
Refreshes the synonym dictionary from Redis if certain conditions are met.
Behavior
Does nothing if Redis connection is not provided.
Only attempts reload if
lookup_numis greater than or equal to 100, avoiding excessive reloads.Enforces a minimum interval of 1 hour between reloads (
time.time() - load_tm >= 3600).On reload, resets
lookup_numto zero and updatesload_tm.Fetches the Redis key
"kevin_synonyms"and attempts to parse it as JSON.Updates the
dictionaryattribute with the new data.Logs errors if JSON parsing or Redis fetching fails.
Usage Example
dealer = Dealer(redis=redis_client)
dealer.load() # Refreshes dictionary if conditions meet
Method: lookup(tk, topn=8)
Dealer.lookup(tk: str, topn: int = 8) -> list
Parameters
tk(str): The token (word or phrase) to lookup synonyms for.topn(int, optional): The maximum number of synonyms to return (default 8).
Returns
A list of synonym strings for the given token, up to
topnentries.
Behavior
If
tkconsists solely of lowercase English letters ([a-z]+), it uses NLTK WordNet to find synonyms:Retrieves synsets for the token.
Extracts lemma names from synsets.
Removes the original token from the results.
Returns a unique list of synonyms.
For other tokens (phrases or containing special characters):
Increments the lookup count (
lookup_num).Calls
load()to refresh dictionary from Redis if needed.Normalizes the token (lowercase, replaces multiple spaces/tabs with a single space).
Looks up the token in the dictionary loaded from JSON or Redis.
If the dictionary entry is a string, wraps it in a list.
Returns up to
topnsynonyms.
Usage Example
dealer = Dealer()
synonyms = dealer.lookup("happy", topn=5)
print(synonyms) # e.g., ['glad', 'felicitous', 'happy', 'well-chosen']
Implementation Details and Algorithms
Synonym Dictionary Loading: The class loads a static synonym dictionary from a JSON file located relative to the project base directory. This is a fallback or default synonym resource.
Redis Integration: If a Redis client is provided, the class periodically refreshes the synonym dictionary from a Redis key
"kevin_synonyms"to support real-time updates without restarting the application.Lookup Counting: To avoid frequent reloads from Redis, the class counts lookups and only attempts to reload after 100 lookups and if at least one hour has passed since the last reload.
WordNet Integration: For simple English lowercase tokens, the class uses NLTK's WordNet lexical database to find synonyms, providing a broader and linguistically richer synonym set.
Normalization: Tokens are normalized by lowercasing and reducing whitespace before dictionary lookup to improve match accuracy.
Error Handling: The class logs warnings and errors for missing files or Redis issues but does not crash the application, ensuring robustness.
Interaction with Other Parts of the System
File Dependency: Uses the utility function
get_project_base_directory()fromapi.utils.file_utilsto determine the base path for loading the synonym JSON file.Redis Dependency: Optionally interacts with a Redis server for dynamic synonym dictionary updates.
NLTK WordNet: Relies on the
wordnetcorpus from NLTK which must be pre-installed and downloaded in the runtime environment.Logging: Uses Python's standard
loggingmodule for warning and error messages.
This file is likely used by modules responsible for query expansion, natural language understanding, or search indexing within the InfiniFlow project.
Example Usage
from synonym import Dealer
import redis
# Initialize Redis client (if real-time update needed)
redis_client = redis.StrictRedis(host='localhost', port=6379, db=0)
# Create a Dealer instance with Redis support
dealer = Dealer(redis=redis_client)
# Lookup synonyms for a single word
syns = dealer.lookup("bank")
print(f"Synonyms for 'bank': {syns}")
# Lookup synonyms for a phrase
syns_phrase = dealer.lookup("data science")
print(f"Synonyms for 'data science': {syns_phrase}")
Visual Diagram
classDiagram
class Dealer {
-lookup_num: int
-load_tm: float
-dictionary: dict
-redis
+__init__(redis=None)
+load()
+lookup(tk: str, topn: int=8) list
}
Summary
The synonym.py file implements a robust synonym management system combining static JSON-based synonyms, dynamic updates via Redis, and linguistic expansions using WordNet. It is designed to be resilient, lightweight, and easily integrated into larger NLP or search systems within the InfiniFlow ecosystem.