sandbox_security_tests_full.py
Overview
sandbox_security_tests_full.py is a comprehensive automated test suite designed to validate the security and resource management capabilities of a sandboxed code execution environment. This environment is accessible via a REST API (SANDBOX_API_URL) that runs user-submitted code snippets in various programming languages (currently Python and Node.js).
The test suite executes multiple predefined test cases concurrently, each designed to verify that the sandbox correctly handles:
Infinite loops and forced termination
Dangerous or unauthorized operations (e.g., disallowed imports, dangerous function calls)
Resource limits (time, memory, output)
Runtime errors and unauthorized access attempts
Normal, successful executions without dependencies or with safe dependencies
Results are collected, validated against expectations, and summarized in a detailed test report.
File Components
Constants
API_URL: URL of the sandbox API endpoint; defaults tohttp://localhost:9385/run.TIMEOUT: Request timeout in seconds for API calls (default 15s).MAX_WORKERS: Maximum number of concurrent threads for executing tests (default 5).
Enumerations
These Enum classes classify possible sandbox execution outcomes and failure reasons.
ResultStatus (str, Enum)
Enumerates the possible high-level result statuses of the sandbox execution:
Member | Description |
|---|---|
Execution completed successfully. | |
| User program raised an error. |
Program exceeded predefined resource limits (time, memory, output). | |
Program attempted unauthorized operations. | |
Runtime exceptions or signals occurred. | |
Error in the sandbox runner itself or communication failure. |
ResourceLimitType (str, Enum)
Specifies which resource limit was exceeded:
TIMEMEMORYOUTPUT
UnauthorizedAccessType (str, Enum)
Specifies type of unauthorized access attempted:
DISALLOWED_SYSCALLFILE_ACCESSNETWORK_ACCESS
RuntimeErrorType (str, Enum)
Specifies runtime error types:
SIGNALLED(e.g., process received a signal)NONZERO_EXIT(program exited with non-zero code)
Data Models
Using Pydantic, the following models represent structured test results.
ExecutionResult (BaseModel)
Represents the detailed result of a single sandbox execution.
Field | Type | Description |
|---|---|---|
|
| Overall result status. |
|
| Captured standard output of the program. |
|
| Captured standard error output. |
|
| Program exit code. |
|
| Additional details (e.g., limit type, error info). |
|
| Resource limit type if applicable. |
|
| Unauthorized access type if applicable. |
|
| Runtime error type if applicable. |
TestResult (BaseModel)
Represents the outcome of a test case execution.
Field | Type | Description |
|---|---|---|
|
| Test case name. |
|
| Whether the test passed validation. |
|
| Execution duration in seconds. |
|
| True if the test is expected to fail. |
|
| Detailed execution result, if available. |
|
| Request or execution error string. |
|
| Validation error message if test failed validation. |
Functions
encode_code(code: str) -> str
Encodes the source code string into a base64-encoded UTF-8 string.
Parameters:
code(str): Source code to encode.
Returns:
str: Base64-encoded source code string.
Usage Example:
encoded = encode_code("print('Hello')")
print(encoded)
# Outputs base64 string representing the code
execute_single_test(name: str, code: str, language: str, arguments: dict, expect_fail: bool = False) -> TestResult
Executes a single code snippet test case by sending it to the sandbox API and collecting its response.
Retries if the sandbox returns an API rate limiting exit code (
-429).Measures execution duration.
Validates the result against expectations.
Handles request exceptions gracefully.
Parameters:
name(str): Test case name.code(str): Source code to run.language(str): Programming language identifier (e.g.,"python","nodejs").arguments(dict): Additional arguments to pass to the sandbox (currently unused).expect_fail(bool): Whether the test is expected to fail (defaultFalse).
Returns:
TestResult: Result and metadata for the test execution.
Usage Example:
result = execute_single_test(
name="Test 1",
code="def main(): return 1",
language="python",
arguments={},
expect_fail=False,
)
print(result.passed)
validate_test_result(name: str, expect_fail: bool, test_result: TestResult) -> None
Validates the sandbox execution result against the test expectations.
Checks if the test result exists.
Marks
passedstatus accordingly.If
expect_failisTrue, passing results are flagged as validation errors.If
expect_failisFalse, non-success results are flagged as validation errors.
Parameters:
name(str): Test case name (for error messages).expect_fail(bool): Whether the test should fail.test_result(TestResult): Test execution result to validate.
Returns:
None: Modifiestest_resultin-place.
get_test_cases() -> Dict[str, dict]
Returns a dictionary of predefined test cases with their source code, expected failure flag, language, and arguments.
Includes 17 test cases covering infinite loops, dangerous imports, dangerous calls, memory exhaustion, and normal runs in Python and Node.js.
Each test case is keyed by a descriptive name.
Returns:
Dict[str, dict]: Mapping of test case names to test details.
Usage Example:
tests = get_test_cases()
print(tests["7 Normal test: Python without dependencies"]["code"])
print_test_report(results: Dict[str, TestResult]) -> None
Prints a formatted summary report of all test results to the console.
Displays pass/fail status, duration, errors, validation errors, and extra details.
Shows overall statistics (passed, failed, total).
Uses emojis for quick visual status indicators:
✅ Passed test
❌ Failed test
⚠️ Expected failure passed (unexpected)
✓ Expected failure correctly failed
Parameters:
results(Dict[str, TestResult]): Mapping of test names to results.
Returns:
None
main() -> None
Entry point for the test suite.
Prints startup info including API URL and concurrency level.
Retrieves all test cases.
Uses a
ThreadPoolExecutorto execute tests concurrently (up toMAX_WORKERS).Submits each test to the executor with a small delay between submissions for throttling.
Collects results as tests complete.
Prints the overall test report.
Exits with status
1if any test that was expected to succeed failed.
Returns:
None
Implementation Details & Algorithms
Concurrency: Uses Python
concurrent.futures.ThreadPoolExecutorwith a configurable max worker count to parallelize test execution, improving speed.Rate Limiting Handling: If the sandbox API returns an exit code
-429(too many requests), the test function retries after a 0.5-second delay.Base64 Encoding: Source code is base64 encoded before transmission to the sandbox API to handle arbitrary code content safely.
Validation Logic: Differentiates between tests that are expected to fail (e.g., dangerous code) vs. those expected to succeed, allowing detection of both false positives and false negatives.
Detailed Result Parsing: Uses Pydantic models to parse and validate the JSON response from the sandbox API, facilitating structured access to status, outputs, and error details.
Graceful Error Handling: Network issues or API errors are captured and recorded in the test result without crashing the test runner.
Interaction With Other System Components
Sandbox API: The core dependency is the sandbox API endpoint (
SANDBOX_API_URL) that runs submitted code safely and returns detailed execution results in JSON format.Environment Variables: The test suite reads the sandbox API URL from the environment or falls back to the default.
External Libraries: Uses
requestsfor HTTP communication andpydanticfor data validation.Python and Node.js Runtime: Tests include code snippets for both languages, requiring the sandbox to support these runtimes.
This file is typically run as a standalone test executable but can be integrated into CI/CD pipelines or monitoring systems to continuously verify sandbox security properties.
Visual Diagram
classDiagram
class ExecutionResult {
+ResultStatus status
+str stdout
+str stderr
+int exit_code
+Optional[str] detail
+Optional[ResourceLimitType] resource_limit_type
+Optional[UnauthorizedAccessType] unauthorized_access_type
+Optional[RuntimeErrorType] runtime_error_type
}
class TestResult {
+str name
+bool passed
+float duration
+bool expected_failure
+Optional[ExecutionResult] result
+Optional[str] error
+Optional[str] validation_error
}
class SandboxSecurityTests {
+execute_single_test(name, code, language, arguments, expect_fail) TestResult
+validate_test_result(name, expect_fail, test_result) void
+get_test_cases() Dict[str, dict]
+print_test_report(results) void
+main() void
-encode_code(code) str
}
ExecutionResult <|-- TestResult
SandboxSecurityTests ..> ExecutionResult
SandboxSecurityTests ..> TestResult
Summary
sandbox_security_tests_full.py is a robust automated test suite that programmatically submits diverse code snippets to a sandbox execution environment and verifies the sandbox’s ability to:
Enforce resource and security restrictions
Detect and terminate unsafe code (e.g., infinite loops, dangerous imports)
Correctly execute safe code snippets
Provide detailed execution feedback
Its concurrency, retry logic, and detailed validation make it suitable for continuous testing and regression detection in a sandbox security context.