expression.py
Overview
The [expression.py](/projects/286/67471) module provides functionality to parse, compile, and evaluate logical match expressions following a custom grammar. These expressions are primarily used in command-line options such as `-k` and `-m` in testing frameworks (e.g., pytest) to filter tests or items based on complex boolean conditions.
The module transforms textual match expressions into a Python Abstract Syntax Tree (AST), compiles them into executable code objects, and evaluates them against user-provided matcher functions. This enables flexible, dynamic filtering logic combining identifiers, boolean operators (`and`, `or`, `not`), and optional keyword arguments.
Detailed Description
Grammar and Semantics
The match expressions are composed from identifiers with optional keyword arguments combined using boolean operators:
expression: expr? EOF
expr: and_expr ('or' and_expr)*
and_expr: not_expr ('and' not_expr)*
not_expr: 'not' not_expr | '(' expr ')' | ident kwargs?
ident: (\w|:|\+|-|\.|\[|\]|\\|/)+
kwargs: ('(' name '=' value ( ', ' name '=' value )* ')')
name: a valid ident, but not a reserved keyword
value: (unescaped) string literal | (-)?[0-9]+ | 'False' | 'True' | 'None'
Empty expressions evaluate to
False.Identifiers (
ident) are evaluated via a user-provided matcher function.Boolean operators have standard semantics.
Identifiers can be called with keyword arguments — these are passed to the matcher function.
Classes and Functions
Enum: TokenType
Defines token types used by the lexical scanner.
Member | Description |
|---|---|
`LPAREN` | Left parenthesis `(` |
`RPAREN` | Right parenthesis `)` |
`OR` | Logical OR operator |
`AND` | Logical AND operator |
`NOT` | Logical NOT operator |
`IDENT` | Identifier token |
`EOF` | End of input |
`EQUAL` | Equals sign `=` |
`STRING` | String literal |
`COMMA` | Comma `,` |
Dataclass: Token
Represents a lexical token with type, value, and position.
Attributes:
type: TokenType— token categoryvalue: str— token textpos: int— zero-based position in input string
Exception: ParseError
Raised when invalid syntax is encountered during parsing.
Constructor:
__init__(column: int, message: str)—columnis 1-based index of error location.
String representation: Shows error location and description.
Class: Scanner
Lexical analyzer that tokenizes the input expression string.
Initialization:
Scanner(input: str)Takes the input string and prepares an iterator over tokens.
Methods:
lex(input: str) -> Iterator[Token]Generator producing tokens from the input string. Supports:
Skips whitespace.
Recognizes parentheses, operators, identifiers, strings (single or double quoted).
Raises
ParseErroron invalid tokens or malformed strings.
accept(type: TokenType, *, reject: bool = False) -> Token | NoneIf the current token matches the specified type, consumes and returns it; otherwise returns
Noneor raises ifreject=True.reject(expected: Sequence[TokenType]) -> NoReturnRaises
ParseErrorindicating expected token types vs. the encountered token.
Parsing Functions
All parsing functions take a `Scanner` instance, consume tokens, and produce Python AST nodes (`ast.expr`) representing the logical structure.
expression(s: Scanner) -> ast.ExpressionParses a full expression followed by EOF. Returns an AST Expression. Returns
Falseconstant if input is empty.expr(s: Scanner) -> ast.exprParses expressions connected by
or.and_expr(s: Scanner) -> ast.exprParses expressions connected by
and.not_expr(s: Scanner) -> ast.exprParses
notexpressions, parenthesized expressions, or identifiers with optional keyword arguments.single_kwarg(s: Scanner) -> ast.keywordParses a single keyword argument of form
name=value, checking for valid Python identifiers and reserved keywords.all_kwargs(s: Scanner) -> list[ast.keyword]Parses one or more comma-separated keyword arguments.
Protocol: MatcherCall
Describes the callable matcher interface used during evaluation.
def __call__(self, name: str, /, **kwargs: str | int | bool | None) -> bool: ...
Called with an identifier name and optional keyword arguments.
Returns a boolean indicating if the identifier matches.
Dataclass: MatcherNameAdapter
Wraps a matcher callable for a single identifier.
Attributes:
matcher: MatcherCallname: str
Methods:
__bool__()— evaluatesmatcher(name).__call__(**kwargs)— evaluatesmatcher(name, **kwargs).
Class: MatcherAdapter
Adapts a matcher callable into a mapping interface expected by Python's `eval()` locals.
Constructor:
MatcherAdapter(matcher: MatcherCall)Methods:
__getitem__(key: str) -> MatcherNameAdapterStrips the internal identifier prefix (
$) and returnsMatcherNameAdapterfor that identifier.__iter__()and__len__()raiseNotImplementedErrorsince iteration is not supported.
Class: Expression
Represents a compiled match expression.
Attributes:
code: types.CodeType— compiled code object.
Methods:
@classmethod compile(cls, input: str) -> ExpressionParses and compiles an input match expression string into an
Expressioninstance.Parameters:
input: The match expression string.
**Returns:**
Expressioninstance wrapping compiled code.
**Example:**
expr = Expression.compile("foo and not bar(baz=1)")evaluate(self, matcher: MatcherCall) -> boolEvaluates the compiled expression against a matcher function.
Parameters:
matcher: Callable that takes an identifier and optional kwargs, returning a boolean.
**Returns:**
Boolean result of expression evaluation.
**Example:**
def my_matcher(name, **kwargs): if name == "foo": return True if name == "bar" and kwargs.get("baz") == 1: return False return False expr = Expression.compile("foo and not bar(baz=1)") result = expr.evaluate(my_matcher) # True or False
Important Implementation Details
The parser uses a recursive descent approach with operator precedence:
nothas highest precedence.andis next.orhas lowest precedence.
Identifiers are internally prefixed with
$to avoid conflicts with Python keywords or builtins when converting to Python ASTNamenodes.Keyword argument values support string literals (no escapes), integers, booleans (
True,False), andNone.The evaluation uses Python's
eval()on a safe environment with no builtins, using a customMatcherAdapterto resolve identifiers.Escaping strings with backslashes is explicitly disallowed to keep parsing simpler.
The module raises detailed
ParseErrorexceptions with precise column information to aid debugging.
Interactions With Other System Components
Used primarily by command-line interfaces or test frameworks to interpret filter expressions for test selection or other conditional matching.
The
matchercallable is implemented elsewhere in the system, encapsulating logic to decide if a given identifier (potentially with kwargs) matches certain criteria.This module acts as a parsing and evaluation engine, isolated from the matcher logic, enabling flexible integration with any matching backend.
Visual Diagram
classDiagram
class Scanner {
+__init__(input: str)
+lex(input: str) Iterator~Token~
+accept(type: TokenType, reject: bool = False) Token|None
+reject(expected: Sequence~TokenType~)
}
class Token {
+type: TokenType
+value: str
+pos: int
}
class ParseError {
+__init__(column: int, message: str)
+__str__()
}
class Expression {
+code: CodeType
+compile(input: str) Expression
+evaluate(matcher: MatcherCall) bool
}
class MatcherAdapter {
+__init__(matcher: MatcherCall)
+__getitem__(key: str) MatcherNameAdapter
}
class MatcherNameAdapter {
+__init__(matcher: MatcherCall, name: str)
+__bool__() bool
+__call__(**kwargs) bool
}
Scanner --> Token
Expression --> MatcherAdapter
MatcherAdapter --> MatcherNameAdapter
MatcherNameAdapter ..> MatcherCall : uses
Scanner ..> ParseError : raises
Expression ..> ParseError : raises
Summary
The [expression.py](/projects/286/67471) module is a self-contained parser and evaluator for user-defined boolean match expressions. It converts textual input into executable code, enabling complex filtering logic for testing frameworks or similar applications. The design cleanly separates parsing, tokenization, AST construction, and evaluation, facilitating extensibility and integration with various matcher implementations.