web_utils.py
Overview
The web_utils.py file is a utility module providing various helper functions primarily related to web operations, including:
Converting HTML content or URLs into PDF documents using a headless Chrome browser.
Validating URLs and IP addresses to enhance security by filtering out private IPs.
Safely parsing JSON data from strings or dictionaries.
Safely extracting float values from dictionaries with default fallbacks.
Sending invitation emails via an SMTP mail server using Flask-Mail and templating.
This file is designed to support web applications by offering reusable utilities to handle common web-related tasks, improving code modularity and maintainability.
Detailed Explanation of Functions
1. html2pdf(source: str, timeout: int = 2, install_driver: bool = True, print_options: dict = {}) -> bytes
Purpose:
Converts an HTML source (either a URL or a local file path) into a PDF document.
Parameters:
source(str): The URL or local file path of the HTML content to convert.timeout(int, optional): Maximum wait time in seconds for the page to load before generating the PDF. Defaults to 2 seconds.install_driver(bool, optional): Whether to install the ChromeDriver automatically usingwebdriver_manager. Defaults toTrue.print_options(dict, optional): Additional PDF print options passed to Chrome DevTools protocol (e.g., landscape, printBackground). Defaults to an empty dictionary.
Returns:bytes - The binary PDF data generated from the HTML content.
Usage Example:
pdf_bytes = html2pdf("https://example.com", timeout=5)
with open("output.pdf", "wb") as f:
f.write(pdf_bytes)
Implementation Details:
Uses Selenium WebDriver with headless Chrome to load the HTML page.
Waits for the page to become stale (indicating load completion) or times out.
Sends a Chrome DevTools command
Page.printToPDFto generate the PDF.Returns the PDF as binary data decoded from a base64 string.
2. __send_devtools(driver, cmd, params={}) -> dict
Purpose:
Internal helper function to send commands directly to Chrome DevTools Protocol via the Selenium WebDriver.
Parameters:
driver: Selenium WebDriver instance.cmd(str): The DevTools command to execute.params(dict, optional): Parameters for the command.
Returns:dict - The result from the DevTools command.
Notes:
This is a private function (indicated by the double underscore prefix).
Raises an exception if the response is not successful.
3. __get_pdf_from_html(path: str, timeout: int, install_driver: bool, print_options: dict) -> bytes | None
Purpose:
Internal function that implements the detailed logic for rendering an HTML page to PDF.
Parameters:
path(str): URL or file path of the HTML to render.timeout(int): Seconds to wait for page load.install_driver(bool): Whether to auto-install ChromeDriver.print_options(dict): Chrome print options.
Returns:bytes - PDF binary data, or None if generation failed.
Implementation Details:
Configures headless Chrome with options for sandboxing and no image loading.
Starts ChromeDriver, either using an installed driver or assuming pre-installed.
Waits for the page's
<html>element to become stale to ensure page load.Uses Chrome DevTools
Page.printToPDFto generate the PDF with specified print options.Cleans up and quits the WebDriver after generation.
4. is_private_ip(ip: str) -> bool
Purpose:
Checks whether a given IP address is within a private network range.
Parameters:
ip(str): The IP address to check.
Returns:bool - True if the IP is private, False otherwise or if invalid.
Usage Example:
is_private = is_private_ip("192.168.1.1") # True
Implementation Details:
Uses Python's
ipaddressmodule to parse and check IP scope.Returns
Falseif the IP is invalid.
5. is_valid_url(url: str) -> bool
Purpose:
Validates a URL string ensuring it matches HTTP/HTTPS scheme and does not point to a private IP address.
Parameters:
url(str): The URL string to validate.
Returns:bool - True if URL is valid and not private, False otherwise.
Usage Example:
valid = is_valid_url("https://www.google.com") # True
Implementation Details:
Uses regex to check URL scheme and format.
Parses hostname and resolves it to an IP.
Uses
is_private_ipto reject URLs pointing to private IPs.Handles DNS resolution errors gracefully.
6. safe_json_parse(data: str | dict) -> dict
Purpose:
Safely parses a JSON string into a Python dictionary, or returns the dictionary if already provided.
Parameters:
data(strordict): JSON string or dict to parse.
Returns:dict - Parsed dictionary or empty dict on failure.
Usage Example:
parsed = safe_json_parse('{"key": "value"}') # {'key': 'value'}
Implementation Details:
Returns empty dict if parsing fails or input is empty.
Supports passing in dicts directly (returns as is).
7. get_float(req: dict, key: str, default: float | int = 10.0) -> float
Purpose:
Extracts a float value from a dictionary for a given key, with a fallback default and validation for positive numbers.
Parameters:
req(dict): The dictionary to extract from.key(str): The key whose value to parse.default(floatorint, optional): Default value if parsing fails or value is invalid. Defaults to10.0.
Returns:float - Parsed float value or default.
Usage Example:
value = get_float({"timeout": "3.5"}, "timeout") # 3.5
value = get_float({"timeout": "-1"}, "timeout") # 10.0 (default)
8. send_invite_email(to_email, invite_url, tenant_id, inviter)
Purpose:
Sends an invitation email to a user to join a team, using a predefined HTML email template.
Parameters:
to_email(str): Recipient's email address.invite_url(str): URL link for completing registration.tenant_id(strorint): Identifier of the tenant/team.inviter(str): Name or identifier of the inviter.
Returns:None
Usage Example:
send_invite_email("[email protected]", "https://app/invite/abc123", "team123", "Alice")
Implementation Details:
Uses Flask's
app.app_context()to access Flask-Mail configuration.Renders an HTML email template using
render_template_string.Sends the email through
smtp_mail_serverimported fromapi.apps.
Important Implementation Details and Algorithms
PDF Generation: Leveraging Selenium with headless Chrome to programmatically load pages and generate PDFs via Chrome DevTools protocol. This method ensures accurate rendering of complex HTML content including CSS.
Security Checks: Validation functions prevent undesirable URLs by filtering out private IP addresses, reducing risk of SSRF (Server-Side Request Forgery) attacks.
Robust JSON Parsing: The utility safely handles malformed JSON inputs without raising exceptions.
Email Sending: Uses Flask-Mail integration with SMTP server for sending templated HTML emails, ensuring consistency and ease of maintenance.
Interaction with Other System Components
Imports
smtp_mail_serverandappfromapi.apps, indicating it depends on the Flask application context and configured SMTP mail server for email functionality.Utilizes external libraries:
seleniumandwebdriver_managerfor browser automation.flask_mailfor email messaging.flaskfor rendering email templates.
Can be used by various parts of the web application backend needing PDF generation, URL validation, JSON processing, or invitation email sending.
Content Type Map
The module contains a dictionary CONTENT_TYPE_MAP mapping common file extensions to their MIME types to facilitate content handling in other parts of the application.
Diagram: Utility Functions Flowchart
flowchart TD
A[html2pdf(source)] --> B[__get_pdf_from_html(path, timeout, install_driver, print_options)]
B --> C[__send_devtools(driver, cmd, params)]
D[is_valid_url(url)] --> E[is_private_ip(ip)]
F[safe_json_parse(data)] --> G[return dict or {}]
H[get_float(req, key, default)] --> I[parse float, fallback default]
J[send_invite_email(...)] --> K[render_template_string(INVITE_EMAIL_TMPL)]
K --> L[smtp_mail_server.send(msg)]
Summary
The web_utils.py file is a utility-focused module designed to provide robust, reusable tools for:
Converting HTML to PDF via headless Chrome.
Ensuring URL and IP safety.
Parsing JSON safely.
Extracting typed data with validation.
Sending templated invitation emails.
It integrates tightly with Flask and Selenium-based components and is critical in enabling secure and feature-rich web backend operations.