download_deps.py


Overview

download_deps.py is a utility script designed to automate the downloading of various external dependencies required by an application or system. It downloads:

This script ensures that all necessary resources are available locally before the application runs, reducing runtime errors due to missing dependencies and improving reproducibility.


Detailed Descriptions

Imports and Dependencies


Functions

get_urls(use_china_mirrors: bool = False) -> Union[str, list[str]]

Returns a list of URLs or URL-filename pairs for dependency downloads.

urls = get_urls(use_china_mirrors=True)
for entry in urls:
    if isinstance(entry, list):
        url, filename = entry
    else:
        url = entry
        filename = url.split("/")[-1]
    print(f"Download {filename} from {url}")

download_model(repo_id: str) -> None

Downloads a snapshot of a Hugging Face repository to a local directory.

download_model("InfiniFlow/deepdoc")

Main Script Execution

When run as a script, the following workflow happens:

  1. Argument Parsing

    • Accepts a command-line flag --china-mirrors to toggle between default and China mirror URLs.

  2. Download URLs

    • Calls get_urls() with the mirror option.

    • Iterates through each URL (or URL-filename pair), printing status messages.

    • Downloads each file only if it does not already exist locally, saving it with the specified or derived filename.

  3. Download NLTK Data

    • Downloads three NLTK datasets: wordnet, punkt, and punkt_tab.

    • Saves them under a local nltk_data directory.

  4. Download Hugging Face Repositories

    • Iterates over a fixed list of repo IDs and downloads each one locally.

python3 download_deps.py --china-mirrors
Downloading libssl1.1_1.1.1f-1ubuntu2_amd64.deb from http://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2_amd64.deb...
Downloading nltk wordnet...
Downloading huggingface repo InfiniFlow/text_concat_xgb_v1.0...
...

Important Implementation Details and Algorithms


Interaction with Other System Components


Visual Diagram

flowchart TD
    A[Start Script] --> B{Parse Args}
    B -->|--china-mirrors| C[get_urls(use_china_mirrors=True)]
    B -->|default| D[get_urls(use_china_mirrors=False)]
    C & D --> E[Iterate URLs]
    E --> F{Is URL a List?}
    F -->|Yes| G[Extract download_url and filename]
    F -->|No| H[Use URL and derive filename]
    G & H --> I{File exists?}
    I -->|No| J[Download file with urllib]
    I -->|Yes| K[Skip download]
    J & K --> L[Download NLTK Data]
    L --> M[Download Hugging Face Repos]
    M --> N[End]

    subgraph Download Hugging Face Repos
        M --> M1[For each repo_id]
        M1 --> M2[Create local_dir]
        M2 --> M3[snapshot_download(repo_id, local_dir)]
    end

    subgraph Download NLTK Data
        L --> L1[Download wordnet]
        L --> L2[Download punkt]
        L --> L3[Download punkt_tab]
    end

Summary

download_deps.py is a robust, user-friendly script to automate the downloading of a diverse set of dependencies including binaries, language datasets, and model repositories. It supports regional mirrors for improved accessibility, prevents redundant downloads, and organizes resources in predictable local directories, enabling a smoother setup and operational experience for applications relying on these resources.