schools.csv


Overview

schools.csv is a large, structured dataset file containing detailed information about educational institutions, primarily universities and colleges, from various regions including China and many international locations. The file is formatted as a CSV (Comma-Separated Values) table, where each row represents a unique school entry with multiple attributes.

This dataset is used to provide foundational school entity data for systems that require comprehensive educational institution metadata, such as resume parsers, education verification systems, academic analytics platforms, or any application needing standardized school information.


Data Structure and Fields

Each row in the file corresponds to a single educational institution and contains the following columns (fields):

Field Name

Description

id

Unique identifier for the school (integer).

type

Type code of the school entity (integer). Specific type meanings depend on external system definitions.

parent_id

Identifier for a parent institution if applicable (integer). Used for hierarchical relationships.

name_cn

School name in Chinese (string).

name_en

School name in English (string).

alias

Alternative names or nicknames for the school, separated by :: if multiple (string).

is_abroad

Boolean flag (0 or 1) indicating if the school is located abroad (outside of China).

is_world_known

Boolean flag (0 or 1) indicating if the school is recognized worldwide.

school_type

Category/type of school, e.g., "综合类" (comprehensive), "高职类" (vocational), "医科类" (medical), "艺术类" (arts), etc.

is_double_first

Boolean flag (0 or 1) indicating if the school is part of the "Double First Class" initiative (prestigious Chinese universities).

education_type

Level or type of education offered, e.g., "本科" (undergraduate), "专科" (specialist), "独立学院" (independent college), etc.

province

Province or state where the school is located (string).

city

City where the school is located (string).

is_985

Boolean flag (0 or 1) indicating if the school belongs to China's "Project 985" (a group of elite universities).


Usage

This CSV file serves as a static reference dataset. It can be imported, queried, or integrated into applications to:

Example usage in Python with pandas:

import pandas as pd

# Load the schools data
schools_df = pd.read_csv('schools.csv', sep='\t')  # assuming tab-separated or adjust accordingly

# Query example: find all world-known schools in Shanghai
shanghai_world_known = schools_df[
    (schools_df['city'] == '上海市') & (schools_df['is_world_known'] == 1)
]

print(shanghai_world_known[['name_cn', 'name_en']])

Important Notes on Implementation/Content


Interaction with Other System Components


Visual Diagram

Since this file is a data utility file (a dataset), the best representation is a flowchart showing how this file is typically used in a system context, especially focusing on its role in data lookup and enrichment.

flowchart TD
    A[Application / Resume Parser] --> B[Load schools.csv Dataset]
    B --> C{Query School Data}
    C -->|Normalize School Name| D[Match by name_cn / name_en / alias]
    C -->|Fetch Metadata| E[Get attributes: province, city, is_985, etc.]
    D --> F[Standardized School Entity]
    E --> F
    F --> G[Use in Resume Parsing / Profile Enrichment]

    style B fill:#f9f,stroke:#333,stroke-width:2px
    style D fill:#bbf,stroke:#333,stroke-width:1px
    style E fill:#bbf,stroke:#333,stroke-width:1px
    style F fill:#bfb,stroke:#333,stroke-width:2px

Summary

This file is a foundational asset for any system that needs to recognize and analyze educational backgrounds with precision and consistency.