schools.py

Overview

The schools.py module provides functionality to manage and query a dataset of educational institutions, primarily universities and colleges. It loads school data from CSV and JSON resources, processes school names into normalized forms, assigns ranking information to schools, and offers utility functions to select school records and verify the quality status of a school.

This module is designed to assist applications that require normalization, ranking, and filtering of school names, such as academic data processing, university ranking aggregations, or educational data analytics.


Detailed Explanation

Global Variables


Functions

loadRank(fnm: str) -> None

Loads school ranking information from a CSV file and updates the global TBL DataFrame by assigning a rank to matching school entries.


split(txt: str) -> list[str]

Splits a text string into tokens with special handling for English words that should remain together.


select(nm: str | list) -> dict | None

Selects a school record from TBL that matches the given school name or alias.


is_good(nm: str) -> bool

Checks if a given school name belongs to the set of "good" schools.


Implementation Details and Algorithms


Interaction with Other Parts of the System


Mermaid Class Diagram

classDiagram
    class schools.py {
        +DataFrame TBL
        +set GOOD_SCH
        +loadRank(fnm: str) void
        +split(txt: str) list
        +select(nm: str | list) dict | None
        +is_good(nm: str) bool
    }

Summary

The schools.py file is a utility module for loading, normalizing, ranking, and querying school data. It offers simple interfaces for selecting schools by name and checking if a school is in a pre-defined "good" list. It standardizes school names and aliases to facilitate robust matching and filtering, making it valuable for educational data processing pipelines.