bertscore.py

Overview

The `bertscore.py` file provides utility functions to summarize code snippets using an AI chat model (via the `ollama` interface), generate pseudo-reference summaries from multiple code units, and compute semantic similarity scores between candidate and reference texts using the BERTScore metric.

This file is intended to assist in evaluating or generating concise summaries of code by leveraging large language model (LLM) capabilities and BERT-based semantic similarity scoring. It is especially useful in contexts such as code summarization, automated documentation, or quality assessment of generated code explanations.

Detailed Documentation

Imports and Dependencies

re — for regular expression processing of model-generated text.
typing.List, typing.Dict — for type hinting.
bert_score — library to compute BERTScore, a semantic similarity metric.
ollama — a custom interface to interact with an LLM chat model.
config — configuration module providing constants SUMMARY_MODEL and BERTSCORE_LANG.

Functions

`summarize_code_unit_ollama`

def summarize_code_unit_ollama(code_snippet: str, model: str = SUMMARY_MODEL) -> str:

**Purpose:** Generates a concise one-sentence summary of a given code snippet using the Ollama chat model.

**Parameters:**

code_snippet (str): The source code text to summarize.
model (str, optional): The Ollama model identifier to use for summarization. Defaults to SUMMARY_MODEL from config.

**Returns:** `str` — A one-sentence summary describing what the code snippet does and any important behavior. If summarization fails, returns a fallback string indicating failure along with the first line of the snippet.

**Implementation Details:**

Limits the input snippet to 4000 characters (to fit model input constraints).
Constructs a prompt asking the model to summarize the code snippet in one sentence.
Calls ollama.chat() with the prompt.
Cleans the model output by removing any content inside <think></think> tags.
Handles exceptions gracefully, logging a warning and returning a fallback summary.

**Usage Example:**

code = "def add(a, b):\n    return a + b"
summary = summarize_code_unit_ollama(code)
print(summary)
# Output: "Adds two numbers and returns the result."

`generate_pseudo_reference_from_code_units`

def generate_pseudo_reference_from_code_units(code_units: List[str], model: str = SUMMARY_MODEL) -> str:

**Purpose:** Creates a combined pseudo-reference summary by summarizing multiple code units individually and concatenating their summaries.

**Parameters:**

code_units (List[str]): A list of code snippet strings to summarize.
model (str, optional): The Ollama model identifier to use for summarization. Defaults to SUMMARY_MODEL.

**Returns:** `str` — A string containing the one-sentence summaries of each code unit, separated by two newlines.

**Implementation Details:**

Iterates over each code unit in code_units.
Calls summarize_code_unit_ollama to obtain each summary.
Joins all summaries with double newline separators.

**Usage Example:**

units = [
    "def add(a, b): return a + b",
    "def subtract(a, b): return a - b"
]
reference = generate_pseudo_reference_from_code_units(units)
print(reference)
# Output:
# "Adds two numbers and returns the result.
#
# Subtracts second number from the first and returns the difference."

`compute_bertscore`

def compute_bertscore(candidate: str, reference: str, lang: str = BERTSCORE_LANG) -> Dict[str, float]:

**Purpose:** Computes the BERTScore semantic similarity metrics (Precision, Recall, F1) between a candidate text and a reference text.

**Parameters:**

candidate (str): The candidate/generated text to evaluate.
reference (str): The reference/ground-truth text to compare against.
lang (str, optional): Language code for BERTScore model selection (e.g., "en"). Defaults to BERTSCORE_LANG from config.

**Returns:** `Dict[str, float]` — Dictionary containing three keys:

"BERTScore_Precision": Precision score as a float.
"BERTScore_Recall": Recall score as a float.
"BERTScore_F1": F1 score as a float.

**Implementation Details:**

Calls bert_score.score() with single-element lists containing candidate and reference.
Enables rescaling with baseline to normalize scores.
Extracts scalar floats from tensor outputs.

**Usage Example:**

candidate = "Adds two numbers."
reference = "Adds two numbers and returns the sum."
scores = compute_bertscore(candidate, reference)
print(scores)
# Output: {'BERTScore_Precision': 0.95, 'BERTScore_Recall': 0.92, 'BERTScore_F1': 0.935}

Important Implementation Details and Algorithms

Handling Large Inputs: The summarize_code_unit_ollama function truncates input code snippets to 4000 characters to avoid exceeding model or API limits.
Prompt Engineering: The prompt explicitly instructs the model to produce a concise one-sentence summary focusing on functionality and important behavior.
Output Cleaning: The function removes any <think>...</think> tags from the model output, which may be artifacts or internal reasoning markers.
Fallback Mechanism: If the model call fails, the function returns a safe fallback that includes the first line of the snippet, helping downstream tasks avoid blank summaries.
BERTScore Usage: Uses the bert_score library's score() method with single-item batches and rescales scores with baseline for reliable semantic similarity evaluation.

Interaction with Other System Components

ollama Module: Provides the interface to the Ollama chat model for summarization tasks.
bert_score Library: Computes semantic similarity metrics.
config Module: Supplies configurable parameters like the model to use for summarization (SUMMARY_MODEL) and the language for BERTScore (BERTSCORE_LANG).
This file likely integrates into a larger evaluation or documentation generation pipeline where code snippets are summarized and then evaluated for quality or similarity.

Visual Diagram

classDiagram
    class bertscore.py {
        +summarize_code_unit_ollama(code_snippet: str, model: str) str
        +generate_pseudo_reference_from_code_units(code_units: List[str], model: str) str
        +compute_bertscore(candidate: str, reference: str, lang: str) Dict[str, float]
    }

    bertscore.py ..> ollama : uses
    bertscore.py ..> bert_score : uses
    bertscore.py ..> config : reads SUMMARY_MODEL, BERTSCORE_LANG

Summary

The `bertscore.py` file is a focused utility module for summarizing code snippets via an LLM chat interface and assessing the semantic similarity of texts using BERTScore. It encapsulates key operations needed in automated code summarization workflows, providing robust handling of model interactions, prompt design, and scoring metrics.

This file acts as a bridge between raw code units and their textual semantic evaluation, enabling downstream applications such as code documentation generation, code review automation, or quality assurance of code explanations.