download_blob.rs
Overview
This file provides functionality to download a binary large object (blob) from one or more specified URLs and save it to a local file system path. It handles retries, timeout management, temporary file handling, and error reporting to ensure robust and reliable downloading of blobs.
The main exported function download_blob orchestrates the downloading process, and it relies on the private helper function download_file to perform the actual HTTP download from a single URL.
Functions
download_blob
pub fn download_blob(
share_full_path: &PathBuf,
tmp_dir_path: &Path,
urls: &[url::Url],
max_tries: u8,
retry_timeout: Option<std::time::Duration>,
deadline: Option<std::time::Instant>,
) -> anyhow::Result<()>
Description
Downloads a blob from a list of URLs and saves it to the specified share_full_path. The download is attempted multiple times (max_tries) across the provided URLs. The function supports specifying an optional retry timeout between attempts and an absolute deadline after which attempts are aborted.
Parameters
share_full_path: &PathBuf
The final destination path where the downloaded file should be saved. If the file already exists, the function returns immediately without re-downloading.tmp_dir_path: &Path
A directory path used for creating a temporary file during the download process to avoid partial or corrupted writes to the final destination.urls: &[url::Url]
A slice of URLs to attempt downloading the blob from. The function tries each URL in order before retrying.max_tries: u8
Maximum number of total attempts to download the blob.retry_timeout: Option<std::time::Duration>
Optional duration to wait between retry attempts if the download fails.deadline: Option<std::time::Instant>
Optional absolute time after which the download attempts will cease.
Return Value
Returns
Ok(())if the blob is successfully downloaded and saved.Returns an error (
anyhow::Result::Err) if the blob cannot be downloaded after all attempts or if an error occurs during file operations.
Usage Example
let urls = vec![
url::Url::parse("https://example.com/blob1").unwrap(),
url::Url::parse("https://backup.example.com/blob1").unwrap(),
];
download_blob(&PathBuf::from("/data/blob1"), Path::new("/tmp"), &urls, 3, Some(std::time::Duration::from_secs(5)), None)?;
Implementation Details
If the target file already exists, the function exits early.
A temporary file is created inside
tmp_dir_pathusingget_temp_file_path(imported fromcrate::helper).The function ensures the parent directory of the temporary file exists.
The download is attempted up to
max_triestimes, iterating through all URLs on each try.If a
deadlineis set and reached, the function bails with an error.After each failed download attempt, the temporary file is truncated and synced to clear any partial data.
On successful download, the temporary file is renamed atomically to the target path.
Uses detailed tracing logs to trace progress and errors.
download_file
fn download_file(
url: &url::Url,
file: &mut std::fs::File,
deadline: Option<std::time::Instant>,
) -> anyhow::Result<()>
Description
Performs a blocking HTTP GET request to download the contents of the specified URL and writes the data directly into the provided file handle.
Parameters
url: &url::Url
The URL to download from.file: &mut std::fs::File
Mutable reference to a file handle where the downloaded content will be written.deadline: Option<std::time::Instant>
Optional deadline to limit the download duration. The HTTP client timeout is adjusted accordingly.
Return Value
Returns
Ok(())if the file is successfully downloaded.Returns an error if the HTTP request fails, the server responds with an error status, or the file cannot be written.
Usage Example
let mut file = std::fs::File::create("/tmp/downloaded_blob")?;
download_file(&url::Url::parse("https://example.com/blob")?, &mut file, None)?;
Implementation Details
Builds a blocking HTTP client with a configurable timeout based on the remaining time until the deadline.
Sets a connection timeout of 3 seconds (defined as
CONNECT_TIMEOUT).Sends a GET request to the specified URL.
If the server returns a 5xx status, it bails with a server error.
If the status is not successful (2xx), it bails with an error including the status code.
Copies the response body directly to the file.
Calls
sync_allon the file to ensure data is flushed to disk.Uses tracing logs to indicate download progress.
Important Constants
CONNECT_TIMEOUT
A constant connection timeout duration of 3 seconds used in the HTTP client to avoid hanging during TCP connection.
Interactions with Other System Components
Temporary File Path Helper
Usesget_temp_file_pathfrom thecrate::helpermodule to generate temporary file paths inside the specified temporary directory. This ensures safe atomic writes.HTTP Client (
reqwestcrate)
Usesreqwest::blocking::Clientto perform synchronous HTTP requests with support for timeouts.Tracing Logs
Uses thetracingcrate for detailed logging of download attempts, successes, and failures.
Algorithm and Workflow Summary
Check if the target file exists: If yes, skip downloading.
Create temporary file: Generate a temporary path and create the file, ensuring parent directories exist.
Download attempts:
Iterate up to
max_tries.For each try, iterate over all URLs.
Check if the deadline is exceeded, bail if yes.
Attempt to download from the current URL.
On failure, log error, truncate the temp file, and retry.
On success, break loops.
If all attempts fail, return error.
Rename the temporary file to the final destination atomically.
File Structure and Function Relationship Diagram
flowchart TD
A[download_blob] --> B[get_temp_file_path]
A --> C[download_file]
C --> D[reqwest::blocking::Client]
A --> E[std::fs::File]
C --> E
A --> F[std::fs::rename]
download_blobis the main function coordinating the process.It calls
get_temp_file_pathto get a temporary file location.It creates and writes to a file (
std::fs::File).Calls
download_fileto fetch data from URLs.download_fileusesreqwest::blocking::Clientfor HTTP requests and writes data to the file.On success,
download_blobrenames the temporary file to the final path.
This file is essential for reliable downloading of blobs with retry and timeout mechanisms, ensuring atomic writes and safe file handling. It encapsulates network interaction and file system operations in a fault-tolerant manner.