surname.py
Overview
surname.py is a utility module designed to identify whether a given string corresponds to a recognized Chinese surname. The module encapsulates a comprehensive set of traditional Chinese surnames, including both single-character and compound (multi-character) family names, reflecting historical and modern usage.
The core functionality is provided by a single function isit(n), which checks membership of an input string within this predefined surname set. This module is intended for use in applications that require validation or recognition of Chinese surnames, such as natural language processing, user data validation, or cultural data analysis.
Contents
Global Variable:
m— asetcontaining all known Chinese surnames (single and compound characters).
Function:
isit(n)— checks if the input stringnis a known Chinese surname.
Detailed Explanation
Variable: m
Type:
set[str]Description:
mcontains a collection of Chinese surnames. This includes:The 100 most common single-character surnames.
Hundreds of less common single-character surnames.
Traditional compound surnames consisting of two or more characters (e.g., "欧阳", "司马", "上官").
Implementation Detail:
The use of asetprovides O(1) average time complexity for membership tests, makingisit()efficient even with a large number of surnames.Example content snippet:
m = set([ "赵", "钱", "孙", "李", ..., "欧阳", "司马", "上官", "夏侯", ... ])
Function: isit(n)
def isit(n):
return n.strip() in m
Purpose:
Determines if the input stringnmatches a known Chinese surname from the setm.Parameters:
n(str): A string representing a potential Chinese surname. The function strips leading and trailing whitespace before testing.
Returns:
bool:Trueifn(after stripping) is in the surname setm, otherwiseFalse.
Usage Example:
>>> isit("李")
True
>>> isit(" 欧阳 ")
True
>>> isit("张三")
False # '张三' is not a surname, '张' is.
>>> isit("Smith")
False
Notes:
The function only matches exact surname strings; it does not parse or extract surnames from longer names or phrases.
Trimming whitespace ensures that accidental spaces do not affect the check.
Implementation Details
The module relies on a statically defined set
mwhich includes an extensive list of Chinese surnames gathered from historical and contemporary sources.The choice of a
setformensures fast lookups.The function
isitis minimalist and efficient, performing only a strip and membership check.No external dependencies or complex algorithms are used.
Integration and Interaction
Intended Use Case:
This module is suitable as a standalone surname validator or as a component in larger systems that process Chinese personal names, such as:Identity verification systems.
Chinese NLP pipelines (e.g., name entity recognition).
Databases requiring surname validation.
User input sanitization for Chinese names.
Interaction with Other Modules:
The module is independent; it exports onlyisit(). Other parts of an application can import and use this function to validate surnames before further processing or storage.
Diagram: Module Structure
flowchart TD
A[Input String n] --> B[Strip whitespace]
B --> C{Is n in set m?}
C -->|Yes| D[Return True]
C -->|No| E[Return False]
style B fill:#f9f,stroke:#333,stroke-width:1px
style C fill:#bbf,stroke:#333,stroke-width:1px
style D fill:#afa,stroke:#333,stroke-width:1px
style E fill:#faa,stroke:#333,stroke-width:1px
Summary
surname.py provides a fast, simple, and effective way to verify if a string is a recognized Chinese surname. With its extensive and carefully curated surname dataset, it supports both common and rare family names, including compound surnames. Its minimalistic design allows easy integration into larger software systems requiring Chinese surname validation.
End of documentation for surname.py