bertscore.py
Overview
The `bertscore.py` file provides utility functions to summarize code snippets using an AI chat model (via the `ollama` interface), generate pseudo-reference summaries from multiple code units, and compute semantic similarity scores between candidate and reference texts using the BERTScore metric.
This file is intended to assist in evaluating or generating concise summaries of code by leveraging large language model (LLM) capabilities and BERT-based semantic similarity scoring. It is especially useful in contexts such as code summarization, automated documentation, or quality assessment of generated code explanations.
Detailed Documentation
Imports and Dependencies
re— for regular expression processing of model-generated text.typing.List,typing.Dict— for type hinting.bert_score— library to compute BERTScore, a semantic similarity metric.ollama— a custom interface to interact with an LLM chat model.config— configuration module providing constantsSUMMARY_MODELandBERTSCORE_LANG.
Functions
summarize_code_unit_ollama
def summarize_code_unit_ollama(code_snippet: str, model: str = SUMMARY_MODEL) -> str:
**Purpose:** Generates a concise one-sentence summary of a given code snippet using the Ollama chat model.
**Parameters:**
code_snippet(str): The source code text to summarize.model(str, optional): The Ollama model identifier to use for summarization. Defaults toSUMMARY_MODELfrom config.
**Returns:** `str` — A one-sentence summary describing what the code snippet does and any important behavior. If summarization fails, returns a fallback string indicating failure along with the first line of the snippet.
**Implementation Details:**
Limits the input snippet to 4000 characters (to fit model input constraints).
Constructs a prompt asking the model to summarize the code snippet in one sentence.
Calls
ollama.chat()with the prompt.Cleans the model output by removing any content inside
<think></think>tags.Handles exceptions gracefully, logging a warning and returning a fallback summary.
**Usage Example:**
code = "def add(a, b):\n return a + b"
summary = summarize_code_unit_ollama(code)
print(summary)
# Output: "Adds two numbers and returns the result."
generate_pseudo_reference_from_code_units
def generate_pseudo_reference_from_code_units(code_units: List[str], model: str = SUMMARY_MODEL) -> str:
**Purpose:** Creates a combined pseudo-reference summary by summarizing multiple code units individually and concatenating their summaries.
**Parameters:**
code_units(List[str]): A list of code snippet strings to summarize.model(str, optional): The Ollama model identifier to use for summarization. Defaults toSUMMARY_MODEL.
**Returns:** `str` — A string containing the one-sentence summaries of each code unit, separated by two newlines.
**Implementation Details:**
Iterates over each code unit in
code_units.Calls
summarize_code_unit_ollamato obtain each summary.Joins all summaries with double newline separators.
**Usage Example:**
units = [
"def add(a, b): return a + b",
"def subtract(a, b): return a - b"
]
reference = generate_pseudo_reference_from_code_units(units)
print(reference)
# Output:
# "Adds two numbers and returns the result.
#
# Subtracts second number from the first and returns the difference."
compute_bertscore
def compute_bertscore(candidate: str, reference: str, lang: str = BERTSCORE_LANG) -> Dict[str, float]:
**Purpose:** Computes the BERTScore semantic similarity metrics (Precision, Recall, F1) between a candidate text and a reference text.
**Parameters:**
candidate(str): The candidate/generated text to evaluate.reference(str): The reference/ground-truth text to compare against.lang(str, optional): Language code for BERTScore model selection (e.g.,"en"). Defaults toBERTSCORE_LANGfrom config.
**Returns:** `Dict[str, float]` — Dictionary containing three keys:
"BERTScore_Precision": Precision score as a float."BERTScore_Recall": Recall score as a float."BERTScore_F1": F1 score as a float.
**Implementation Details:**
Calls
bert_score.score()with single-element lists containing candidate and reference.Enables rescaling with baseline to normalize scores.
Extracts scalar floats from tensor outputs.
**Usage Example:**
candidate = "Adds two numbers."
reference = "Adds two numbers and returns the sum."
scores = compute_bertscore(candidate, reference)
print(scores)
# Output: {'BERTScore_Precision': 0.95, 'BERTScore_Recall': 0.92, 'BERTScore_F1': 0.935}
Important Implementation Details and Algorithms
Handling Large Inputs: The
summarize_code_unit_ollamafunction truncates input code snippets to 4000 characters to avoid exceeding model or API limits.Prompt Engineering: The prompt explicitly instructs the model to produce a concise one-sentence summary focusing on functionality and important behavior.
Output Cleaning: The function removes any
<think>...</think>tags from the model output, which may be artifacts or internal reasoning markers.Fallback Mechanism: If the model call fails, the function returns a safe fallback that includes the first line of the snippet, helping downstream tasks avoid blank summaries.
BERTScore Usage: Uses the
bert_scorelibrary'sscore()method with single-item batches and rescales scores with baseline for reliable semantic similarity evaluation.
Interaction with Other System Components
ollamaModule: Provides the interface to the Ollama chat model for summarization tasks.bert_scoreLibrary: Computes semantic similarity metrics.configModule: Supplies configurable parameters like the model to use for summarization (SUMMARY_MODEL) and the language for BERTScore (BERTSCORE_LANG).This file likely integrates into a larger evaluation or documentation generation pipeline where code snippets are summarized and then evaluated for quality or similarity.
Visual Diagram
classDiagram
class bertscore.py {
+summarize_code_unit_ollama(code_snippet: str, model: str) str
+generate_pseudo_reference_from_code_units(code_units: List[str], model: str) str
+compute_bertscore(candidate: str, reference: str, lang: str) Dict[str, float]
}
bertscore.py ..> ollama : uses
bertscore.py ..> bert_score : uses
bertscore.py ..> config : reads SUMMARY_MODEL, BERTSCORE_LANG
Summary
The `bertscore.py` file is a focused utility module for summarizing code snippets via an LLM chat interface and assessing the semantic similarity of texts using BERTScore. It encapsulates key operations needed in automated code summarization workflows, providing robust handling of model interactions, prompt design, and scoring metrics.
This file acts as a bridge between raw code units and their textual semantic evaluation, enabling downstream applications such as code documentation generation, code review automation, or quality assurance of code explanations.