sandbox_security_tests_full.py

Overview

sandbox_security_tests_full.py is a comprehensive automated test suite designed to validate the security and resource management capabilities of a sandboxed code execution environment. This environment is accessible via a REST API (SANDBOX_API_URL) that runs user-submitted code snippets in various programming languages (currently Python and Node.js).

The test suite executes multiple predefined test cases concurrently, each designed to verify that the sandbox correctly handles:

Infinite loops and forced termination
Dangerous or unauthorized operations (e.g., disallowed imports, dangerous function calls)
Resource limits (time, memory, output)
Runtime errors and unauthorized access attempts
Normal, successful executions without dependencies or with safe dependencies

Results are collected, validated against expectations, and summarized in a detailed test report.

File Components

Constants

API_URL: URL of the sandbox API endpoint; defaults to http://localhost:9385/run.
TIMEOUT: Request timeout in seconds for API calls (default 15s).
MAX_WORKERS: Maximum number of concurrent threads for executing tests (default 5).

Enumerations

These Enum classes classify possible sandbox execution outcomes and failure reasons.

`ResultStatus` (str, Enum)

Enumerates the possible high-level result statuses of the sandbox execution:

Member	Description
SUCCESS	Execution completed successfully.
`PROGRAM_ERROR`	User program raised an error.
RESOURCE_LIMIT_EXCEEDED	Program exceeded predefined resource limits (time, memory, output).
UNAUTHORIZED_ACCESS	Program attempted unauthorized operations.
RUNTIME_ERROR	Runtime exceptions or signals occurred.
PROGRAM_RUNNER_ERROR	Error in the sandbox runner itself or communication failure.

`ResourceLimitType` (str, Enum)

Specifies which resource limit was exceeded:

TIME
MEMORY
OUTPUT

`UnauthorizedAccessType` (str, Enum)

Specifies type of unauthorized access attempted:

DISALLOWED_SYSCALL
FILE_ACCESS
NETWORK_ACCESS

`RuntimeErrorType` (str, Enum)

Specifies runtime error types:

SIGNALLED (e.g., process received a signal)
NONZERO_EXIT (program exited with non-zero code)

Data Models

Using Pydantic, the following models represent structured test results.

`ExecutionResult` (BaseModel)

Represents the detailed result of a single sandbox execution.

Field	Type	Description
`status`	`ResultStatus`	Overall result status.
`stdout`	`str`	Captured standard output of the program.
`stderr`	`str`	Captured standard error output.
`exit_code`	`int`	Program exit code.
`detail`	`Optional[str]`	Additional details (e.g., limit type, error info).
`resource_limit_type`	`Optional[ResourceLimitType]`	Resource limit type if applicable.
`unauthorized_access_type`	`Optional[UnauthorizedAccessType]`	Unauthorized access type if applicable.
`runtime_error_type`	`Optional[RuntimeErrorType]`	Runtime error type if applicable.

`TestResult` (BaseModel)

Represents the outcome of a test case execution.

Field	Type	Description
`name`	`str`	Test case name.
`passed`	`bool`	Whether the test passed validation.
`duration`	`float`	Execution duration in seconds.
`expected_failure`	`bool`	True if the test is expected to fail.
`result`	`Optional[ExecutionResult]`	Detailed execution result, if available.
`error`	`Optional[str]`	Request or execution error string.
`validation_error`	`Optional[str]`	Validation error message if test failed validation.

Functions

`encode_code(code: str) -> str`

Encodes the source code string into a base64-encoded UTF-8 string.

Parameters:

code (str): Source code to encode.

Returns:

str: Base64-encoded source code string.

Usage Example:

encoded = encode_code("print('Hello')")
print(encoded)
# Outputs base64 string representing the code

`execute_single_test(name: str, code: str, language: str, arguments: dict, expect_fail: bool = False) -> TestResult`

Executes a single code snippet test case by sending it to the sandbox API and collecting its response.

Retries if the sandbox returns an API rate limiting exit code (-429).
Measures execution duration.
Validates the result against expectations.
Handles request exceptions gracefully.

Parameters:

name (str): Test case name.
code (str): Source code to run.
language (str): Programming language identifier (e.g., "python", "nodejs").
arguments (dict): Additional arguments to pass to the sandbox (currently unused).
expect_fail (bool): Whether the test is expected to fail (default False).

Returns:

TestResult: Result and metadata for the test execution.

Usage Example:

result = execute_single_test(
    name="Test 1",
    code="def main(): return 1",
    language="python",
    arguments={},
    expect_fail=False,
)
print(result.passed)

`validate_test_result(name: str, expect_fail: bool, test_result: TestResult) -> None`

Validates the sandbox execution result against the test expectations.

Checks if the test result exists.
Marks passed status accordingly.
If expect_fail is True, passing results are flagged as validation errors.
If expect_fail is False, non-success results are flagged as validation errors.

Parameters:

name (str): Test case name (for error messages).
expect_fail (bool): Whether the test should fail.
test_result (TestResult): Test execution result to validate.

Returns:

None: Modifies test_result in-place.

`get_test_cases() -> Dict[str, dict]`

Returns a dictionary of predefined test cases with their source code, expected failure flag, language, and arguments.

Includes 17 test cases covering infinite loops, dangerous imports, dangerous calls, memory exhaustion, and normal runs in Python and Node.js.
Each test case is keyed by a descriptive name.

Returns:

Dict[str, dict]: Mapping of test case names to test details.

Usage Example:

tests = get_test_cases()
print(tests["7 Normal test: Python without dependencies"]["code"])

`print_test_report(results: Dict[str, TestResult]) -> None`

Prints a formatted summary report of all test results to the console.

Displays pass/fail status, duration, errors, validation errors, and extra details.
Shows overall statistics (passed, failed, total).
Uses emojis for quick visual status indicators:
- ✅ Passed test
- ❌ Failed test
- ⚠️ Expected failure passed (unexpected)
- ✓ Expected failure correctly failed

Parameters:

results (Dict[str, TestResult]): Mapping of test names to results.

Returns:

None

`main() -> None`

Entry point for the test suite.

Prints startup info including API URL and concurrency level.
Retrieves all test cases.
Uses a ThreadPoolExecutor to execute tests concurrently (up to MAX_WORKERS).
Submits each test to the executor with a small delay between submissions for throttling.
Collects results as tests complete.
Prints the overall test report.
Exits with status 1 if any test that was expected to succeed failed.

Returns:

None

Implementation Details & Algorithms

Concurrency: Uses Python concurrent.futures.ThreadPoolExecutor with a configurable max worker count to parallelize test execution, improving speed.
Rate Limiting Handling: If the sandbox API returns an exit code -429 (too many requests), the test function retries after a 0.5-second delay.
Base64 Encoding: Source code is base64 encoded before transmission to the sandbox API to handle arbitrary code content safely.
Validation Logic: Differentiates between tests that are expected to fail (e.g., dangerous code) vs. those expected to succeed, allowing detection of both false positives and false negatives.
Detailed Result Parsing: Uses Pydantic models to parse and validate the JSON response from the sandbox API, facilitating structured access to status, outputs, and error details.
Graceful Error Handling: Network issues or API errors are captured and recorded in the test result without crashing the test runner.

Interaction With Other System Components

Sandbox API: The core dependency is the sandbox API endpoint (SANDBOX_API_URL) that runs submitted code safely and returns detailed execution results in JSON format.
Environment Variables: The test suite reads the sandbox API URL from the environment or falls back to the default.
External Libraries: Uses requests for HTTP communication and pydantic for data validation.
Python and Node.js Runtime: Tests include code snippets for both languages, requiring the sandbox to support these runtimes.

This file is typically run as a standalone test executable but can be integrated into CI/CD pipelines or monitoring systems to continuously verify sandbox security properties.

Visual Diagram

classDiagram
    class ExecutionResult {
        +ResultStatus status
        +str stdout
        +str stderr
        +int exit_code
        +Optional[str] detail
        +Optional[ResourceLimitType] resource_limit_type
        +Optional[UnauthorizedAccessType] unauthorized_access_type
        +Optional[RuntimeErrorType] runtime_error_type
    }

    class TestResult {
        +str name
        +bool passed
        +float duration
        +bool expected_failure
        +Optional[ExecutionResult] result
        +Optional[str] error
        +Optional[str] validation_error
    }

    class SandboxSecurityTests {
        +execute_single_test(name, code, language, arguments, expect_fail) TestResult
        +validate_test_result(name, expect_fail, test_result) void
        +get_test_cases() Dict[str, dict]
        +print_test_report(results) void
        +main() void
        -encode_code(code) str
    }

    ExecutionResult <|-- TestResult
    SandboxSecurityTests ..> ExecutionResult
    SandboxSecurityTests ..> TestResult

Summary

sandbox_security_tests_full.py is a robust automated test suite that programmatically submits diverse code snippets to a sandbox execution environment and verifies the sandbox’s ability to:

Enforce resource and security restrictions
Detect and terminate unsafe code (e.g., infinite loops, dangerous imports)
Correctly execute safe code snippets
Provide detailed execution feedback

Its concurrency, retry logic, and detailed validation make it suitable for continuous testing and regression detection in a sandbox security context.

End of Documentation for `sandbox_security_tests_full.py`

sandbox_security_tests_full.py

Overview

File Components

Constants

Enumerations

ResultStatus (str, Enum)

ResourceLimitType (str, Enum)

UnauthorizedAccessType (str, Enum)

RuntimeErrorType (str, Enum)

Data Models

ExecutionResult (BaseModel)

TestResult (BaseModel)

Functions

encode_code(code: str) -> str

execute_single_test(name: str, code: str, language: str, arguments: dict, expect_fail: bool = False) -> TestResult

validate_test_result(name: str, expect_fail: bool, test_result: TestResult) -> None

get_test_cases() -> Dict[str, dict]

print_test_report(results: Dict[str, TestResult]) -> None

main() -> None

Implementation Details & Algorithms

Interaction With Other System Components

Visual Diagram

Summary

End of Documentation for sandbox_security_tests_full.py

`ResultStatus` (str, Enum)

`ResourceLimitType` (str, Enum)

`UnauthorizedAccessType` (str, Enum)

`RuntimeErrorType` (str, Enum)

`ExecutionResult` (BaseModel)

`TestResult` (BaseModel)

`encode_code(code: str) -> str`

`execute_single_test(name: str, code: str, language: str, arguments: dict, expect_fail: bool = False) -> TestResult`

`validate_test_result(name: str, expect_fail: bool, test_result: TestResult) -> None`

`get_test_cases() -> Dict[str, dict]`

`print_test_report(results: Dict[str, TestResult]) -> None`

`main() -> None`

End of Documentation for `sandbox_security_tests_full.py`