wcwidth.py
Overview
The `wcwidth.py` module provides utility functions to measure the display width of Unicode characters and strings in terminal environments. This is critical for applications that require precise alignment of text output, such as command-line interfaces, terminal user interfaces, and text editors.
The module exposes two main functions:
wcwidth(c: str) -> int: Computes the column width of a single Unicode character.wcswidth(s: str) -> int: Computes the total column width of a Unicode string.
Widths are returned according to terminal display conventions:
-1if the character/string contains non-printable/control characters.0for zero-width characters (e.g., combining marks).1for most printable characters.2for East Asian wide/fullwidth characters.
The module uses Unicode properties and caching to efficiently determine these widths.
Detailed Explanation
Function: wcwidth
@lru_cache(100)
def wcwidth(c: str) -> int:
"""Determine how many columns are needed to display a character in a terminal.
Returns -1 if the character is not printable.
Returns 0, 1 or 2 for other characters.
"""
**Purpose:** Calculates the number of terminal column cells required to display the character `c`.
**Parameters:**
c(str): A single Unicode character (string of length 1).
**Returns:**
int: The display width ofcin columns.-1if the character is non-printable/control.0for zero-width characters (e.g., combining marks).1for regular width characters.2for East Asian fullwidth or wide characters.
**Implementation details:**
Uses
ord(c)to get the Unicode code point.Fast path for ASCII printable characters (code points 0x20 to 0x7E) returning width 1.
Special zero-width characters explicitly checked by codepoint ranges.
Uses
unicodedata.categoryto identify control characters (Cc) and combining marks (Me,Mn).Uses
unicodedata.east_asian_widthto identify East Asian fullwidth (F) and wide (W) characters which take two columns.Caches results for performance with
functools.lru_cache(cache size 100).
**Usage example:**
print(wcwidth('a')) # Output: 1
print(wcwidth('あ')) # Output: 2 (Hiragana character)
print(wcwidth('\u0301')) # Output: 0 (Combining acute accent)
print(wcwidth('\x07')) # Output: -1 (Bell control character)
Function: wcswidth
def wcswidth(s: str) -> int:
"""Determine how many columns are needed to display a string in a terminal.
Returns -1 if the string contains non-printable characters.
"""
**Purpose:** Computes the total terminal column width required to display the entire string `s`.
**Parameters:**
s(str): A Unicode string.
**Returns:**
int: Total display width in columns.Returns
-1if any character in the string is non-printable (width-1).Otherwise, sum of widths of all characters.
**Implementation details:**
Normalizes the string using Unicode Normalization Form C (
NFC) to ensure combined characters are composed.Iterates over each character in the normalized string.
Calls
wcwidthon each character.If any character has width
-1, returns-1.Otherwise, accumulates widths and returns the total.
**Usage example:**
print(wcswidth("hello")) # Output: 5
print(wcswidth("コンニチハ")) # Output: 10 (Each character width 2)
print(wcswidth("a\u0301")) # Output: 1 ('a' + combining acute accent)
print(wcswidth("hello\x07world")) # Output: -1 (Bell character inside string)
Important Implementation Details and Algorithms
Caching: The use of
@lru_cache(100)onwcwidthsignificantly improves performance when measuring strings with repeated characters, avoiding redundant Unicode property lookups.Unicode Property Checks:
The module relies on Python's built-inunicodedatamodule for:Character category (
unicodedata.category), e.g., control characters (Cc), combining marks (Me,Mn).East Asian Width (
unicodedata.east_asian_width), identifying wide/fullwidth characters needing two columns.
Special Zero-Width Characters:
Explicit codepoint checks handle certain zero-width formatting characters, e.g., zero-width space (U+200B) and other formatting codes.Normalization:
Thewcswidthfunction normalizes strings to NFC form to ensure consistency in display width calculations, especially for composed characters.
Interaction with Other System Components
This module is a utility component typically used by terminal-based applications, such as:
Text editors that need to align columns or cursor positions.
Command-line interfaces (CLIs) that display tabular or formatted data.
Terminal UI frameworks for layout calculation.
It is standalone and does not depend on or modify global state outside the cache.
Can be imported as a helper module wherever precise control over text display width is required.
Mermaid Diagram
Below is a flowchart representing the main functions and their relationships in `wcwidth.py`.
flowchart TD
A[Start: Input character or string]
subgraph Single Character Width
direction TB
B[wcwidth(c)]
B --> C{Is c ASCII printable?}
C -- Yes --> D[Return 1]
C -- No --> E{Is c zero-width special char?}
E -- Yes --> F[Return 0]
E -- No --> G{Is c control character?}
G -- Yes --> H[Return -1]
G -- No --> I{Is c combining mark?}
I -- Yes --> F
I -- No --> J{Is c East Asian Wide/Fullwidth?}
J -- Yes --> K[Return 2]
J -- No --> D
end
subgraph String Width Calculation
direction TB
L[wcswidth(s)]
L --> M[Normalize s with NFC]
M --> N[For each character c in s]
N --> B
B --> O{wcwidth(c) >= 0?}
O -- No --> P[Return -1]
O -- Yes --> Q[Accumulate total width]
Q --> R{More characters?}
R -- Yes --> N
R -- No --> S[Return total width]
end
Summary
The `wcwidth.py` module provides efficient, Unicode-aware functions to measure the number of terminal columns required to display characters and strings. By leveraging Unicode properties and caching, it delivers accurate width measurements essential for terminal text layout and alignment tasks. Its simple interface and reliance on Python standard libraries make it easy to integrate into terminal-based applications requiring precise text formatting.