config.py

Overview

The `config.py` file serves as a centralized configuration module for a code analysis or code summarization system. It defines a set of constants and mappings that control the behavior of embedding generation, code summarization, similarity thresholds, batch processing parameters, and language mappings based on file extensions. These configurations are used across the system to maintain consistent settings for models, thresholds, and processing parameters.

This file does not contain executable functions or classes but provides important parameters that influence the runtime behavior and performance of the application components related to code embeddings, summarization, and evaluation.


Configuration Constants and Their Usage

EMBED_MODEL

EMBED_MODEL = "unclemusclez/jina-embeddings-v2-base-code"

SUMMARY_MODEL

SUMMARY_MODEL = "hf.co/ertghiu256/qwen3-4b-code-reasoning-gguf:Q4_K_M"

SIM_THRESHOLD

SIM_THRESHOLD = 0.7

PARTIAL_THRESHOLD

PARTIAL_THRESHOLD = 0.5

BERTSCORE_LANG

BERTSCORE_LANG = "en"

EMBED_BATCH

EMBED_BATCH = 16

SUMMARIZE_BATCH

SUMMARIZE_BATCH = 1

TSNE_PERPLEXITY

TSNE_PERPLEXITY = 30

EXT_LANG_MAP

EXT_LANG_MAP = {
    ".py": "python", ".js": "javascript", ".ts": "typescript", ".jsx": "javascript",
    ".tsx": "typescript", ".java": "java", ".rs": "rust", ".cs": "c_sharp",
    ".cpp": "cpp", ".c": "c", ".h": "c", ".html": "html", ".css": "css",
    ".go": "go", ".php": "php",
}

Implementation Details


Interaction With Other Parts of the System

This modular configuration allows other modules to import these constants and adapt their behavior accordingly without hardcoding values.


Example Usage

from config import EMBED_MODEL, SIM_THRESHOLD, EXT_LANG_MAP

def is_similar(embedding1, embedding2):
    similarity = compute_cosine_similarity(embedding1, embedding2)
    return similarity >= SIM_THRESHOLD

def get_language_from_extension(filename):
    ext = os.path.splitext(filename)[1]
    return EXT_LANG_MAP.get(ext, "unknown")

Diagram: Flowchart of Configuration Relationships

flowchart TD
    A[config.py] --> B[Embedding Components]
    A --> C[Summarization Modules]
    A --> D[Similarity & Evaluation]
    A --> E[Visualization Modules]
    A --> F[File Parsing & Language Detection]

    B --> B1[Uses EMBED_MODEL]
    B --> B2[Uses EMBED_BATCH]

    C --> C1[Uses SUMMARY_MODEL]
    C --> C2[Uses SUMMARIZE_BATCH]

    D --> D1[Uses SIM_THRESHOLD]
    D --> D2[Uses PARTIAL_THRESHOLD]
    D --> D3[Uses BERTSCORE_LANG]

    E --> E1[Uses TSNE_PERPLEXITY]

    F --> F1[Uses EXT_LANG_MAP]

Summary

The `config.py` file is a key configuration resource in a code analysis system, defining important constants for embedding generation, code summarization, similarity evaluation, batch processing, visualization, and file language mapping. It ensures parameter consistency and flexibility across the system without embedding hardcoded values in the implementation logic.