plotting.py

Overview

The `plotting.py` file provides visualization utilities designed to analyze and display relationships between **code embeddings** and **documentation embeddings** in a software project. Its primary focus is on semantic coverage and similarity metrics between code and associated documentation units.

There are two main visualization functions:

plotly_radar_and_bar: Generates a combined radar (polar) chart and bar chart summarizing key metrics related to coverage, relevance, and novelty, as well as a breakdown of code units coverage based on similarity thresholds.
plot_semantic_scatter: Creates a 2D scatter plot representing semantic relationships between code and documentation units. It uses dimensionality reduction (t-SNE or UMAP) to map high-dimensional embeddings into a 2D space, highlighting closely related pairs and clusters.

Both functions leverage Plotly for interactive visualization and allow saving the plots as HTML files.

Detailed Documentation

Imports and Constants

External Libraries:
- numpy for array operations.
- plotly.graph_objects and plotly.subplots for interactive plotting.
- scipy.spatial.ConvexHull to compute convex hulls around clusters.
- sklearn.manifold.TSNE for dimensionality reduction.
- umap for alternative dimensionality reduction (UMAP).
Internal Modules:
- config providing threshold and parameter constants (SIM_THRESHOLD, PARTIAL_THRESHOLD, TSNE_PERPLEXITY).
- metrics providing a cosine_similarity function to measure similarity between embeddings.

Function: `plotly_radar_and_bar`

def plotly_radar_and_bar(results_ces: Dict, code_embeddings: List[np.ndarray], doc_embeddings: List[np.ndarray],
                         out_path: Optional[str] = None)

Purpose

Visualizes summary metrics and coverage distribution of code units against documentation using:

A radar chart for CES (Coverage, Relevance, Novelty) metrics.
A bar chart showing counts of code units categorized as Covered, Partial, or Missing based on similarity thresholds.

Parameters

results_ces (Dict): A dictionary containing CES metrics with keys "DirectCoverage", "Relevance", and "Novelty", each mapped to float values in [0, 1].
code_embeddings (List[np.ndarray]): List of embeddings representing code units.
doc_embeddings (List[np.ndarray]): List of embeddings representing documentation units.
out_path (Optional[str]): If provided, the plot is saved as an HTML file at this path; otherwise, it is shown interactively.

Returns

None. It either shows the plot or saves it to a file.

Usage Example

results = {"DirectCoverage": 0.75, "Relevance": 0.65, "Novelty": 0.40}
plotly_radar_and_bar(results, code_embeddings, doc_embeddings, out_path="coverage.html")

Implementation Details

Radar Chart:
- Plots the three CES metrics on a polar plot.
- The categories loop back to the start to close the radar shape.
- Radial axis normalized between 0 and 1.
Coverage Calculation:
- For each code embedding, computes max cosine similarity to any doc embedding.
- Categorizes each code unit as:
  - Covered: similarity ≥ SIM_THRESHOLD.
  - Partial: similarity ≥ PARTIAL_THRESHOLD but < SIM_THRESHOLD.
  - Missing: similarity < PARTIAL_THRESHOLD.
Bar Chart:
- Shows counts of Covered, Partial, and Missing code units with color coding (green, orange, red).
Plot Layout:
- Arranges radar and bar charts side-by-side.
- Configured for clarity and interactive display or saving.

Function: `plot_semantic_scatter`

def plot_semantic_scatter(code_embeddings: List[np.ndarray],
                          doc_embeddings: List[np.ndarray],
                          code_units: List[str],
                          doc_units: List[str],
                          out_path: Optional[str] = None,
                          use_umap: bool = False)

Purpose

Creates an interactive 2D scatter plot visualizing semantic relationships between code and documentation units in a shared embedding space.

Parameters

code_embeddings (List[np.ndarray]): Embeddings for code units.
doc_embeddings (List[np.ndarray]): Embeddings for documentation units.
code_units (List[str]): String labels or identifiers for code units (used for hover text).
doc_units (List[str]): String labels or identifiers for documentation units (used for hover text).
out_path (Optional[str]): Optional path to save the plot as an HTML file.
use_umap (bool): If True, uses UMAP for dimensionality reduction; otherwise uses t-SNE.

Returns

None. Displays or saves the visualization.

Usage Example

plot_semantic_scatter(code_embeddings, doc_embeddings, code_names, doc_names, out_path="semantic_map.html")

Implementation Details

Dimensionality Reduction:
- Combines code and doc embeddings into one array.
- Applies either UMAP or t-SNE to project embeddings into 2D.
- t-SNE perplexity is dynamically adjusted based on data size and config param.
Scatter Plot:
- Code units plotted as blue markers; documentation units as red markers.
- Hover text shows truncated unit content or name for context.
Similarity Lines:
- For each code unit, finds the doc unit with max cosine similarity.
- If similarity ≥ SIM_THRESHOLD, draws a green line connecting the two points.
- Line width and opacity scale with similarity strength.
Convex Hull:
- Computes convex hull around covered code points (those above SIM_THRESHOLD).
- Hull is drawn as a green dashed polygon with translucent fill to highlight coverage cluster.
Plot Layout:
- Sets axis titles and plot dimensions for clear visualization.
- Supports interactive exploration or export.

Key Algorithms and Concepts

Cosine Similarity for Coverage:
- Used to measure semantic similarity between vector representations of code and documentation units.
Dimensionality Reduction:
- t-SNE and UMAP reduce high-dimensional embeddings to 2D for visualization.
- t-SNE perplexity parameter is adaptive to dataset size.
- UMAP offers faster computation and often better global structure preservation.
Convex Hull:
- Encapsulates the cluster of code units considered covered by documentation.
- Visualizes the semantic "coverage area" on the scatter plot.

Interactions with Other Modules

config module:
- Provides constants affecting thresholds and parameters:
  - SIM_THRESHOLD: similarity cutoff to consider a code unit covered.
  - PARTIAL_THRESHOLD: lower similarity cutoff for partial coverage.
  - TSNE_PERPLEXITY: controls perplexity parameter for t-SNE.
metrics module:
- Provides cosine_similarity function to compute similarity scores between embeddings.
umap package:
- Used optionally for dimensionality reduction instead of t-SNE.
Plotly library:
- Used extensively for creating interactive visualizations.

These connections enable the plotting functions to integrate seamlessly into a larger analysis pipeline that generates embeddings, computes metrics, and visualizes results.

Mermaid Diagram: Function Flowchart for `plotting.py`

flowchart TD
    A[plotly_radar_and_bar] --> B[Calculate coverage counts]
    B --> C[Create radar chart for CES metrics]
    B --> D[Create bar chart for coverage counts]
    C & D --> E[Display or save plot]

    F[plot_semantic_scatter] --> G[Combine embeddings]
    G --> H{Use UMAP?}
    H -- Yes --> I[Apply UMAP]
    H -- No --> J[Apply t-SNE]
    I & J --> K[Split reduced coords]
    K --> L[Plot code units scatter]
    K --> M[Plot doc units scatter]
    L & M --> N[Draw similarity lines]
    N --> O[Compute convex hull on covered code units]
    O --> P[Add hull polygon]
    P --> Q[Display or save plot]

Summary

`plotting.py` is a specialized visualization utility focused on semantic coverage and relationships between code and documentation embeddings. It provides interactive, insightful visualizations using advanced dimensionality reduction and similarity metrics, aiding developers and analysts in understanding and improving documentation coverage and relevance.

**End of documentation for `plotting.py`.**

plotting.py

Overview

Detailed Documentation

Imports and Constants

Function: plotly_radar_and_bar

Purpose

Parameters

Returns

Usage Example

Implementation Details

Function: plot_semantic_scatter

Purpose

Parameters

Returns

Usage Example

Implementation Details

Key Algorithms and Concepts

Interactions with Other Modules

Mermaid Diagram: Function Flowchart for plotting.py

Summary

Function: `plotly_radar_and_bar`

Function: `plot_semantic_scatter`

Mermaid Diagram: Function Flowchart for `plotting.py`