-
Notifications
You must be signed in to change notification settings - Fork 457
chore(iast): context refactor, context-array + contextvar indexing #14466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Bootstrap import analysisComparison of import times between this PR and base. SummaryThe average import time from this PR is: 269 ± 2 ms. The average import time from base is: 271 ± 2 ms. The import time difference between this PR and base is: -2.08 ± 0.1 ms. Import time breakdownThe following import paths have shrunk:
|
Performance SLOsCandidate: avara1986/APPSEC-58122_context_refactor (f04f0a4) 🔵 No Baseline Data (24 suites)🔵 coreapiscenario - 12/12 (2 unstable)🔵 No baseline data available for this suite
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌 with a couple comments
This PR is part of the ongoing [IAST context refactor](#14466), and focuses on improving the **naming clarity** of internal structures and functions used within the IAST subsystem. ### 🔄 What’s changed? * Renamed several classes and functions to more accurately reflect their role in the taint tracking lifecycle. * Renamed TaintRangeMapType → TaintedObjectMapType. Consumers of this map retrieve a tainted object first, and then access the ranges inside it. This rename clarifies that the value is the tainted object as a whole, not just range metadata. * Renamed get_taint_map → get_tainted_object_map * Replaced previously confusing or ambiguous terms with consistent, descriptive alternatives (e.g., `contexts_array` → `request_context_slots`, `ApplicationContext` -> `TaintEngineContext`, etc.). ### 🧠 Why? The current naming was either overly generic or misleading, making the IAST implementation harder to understand and maintain. This refactor aligns terminology with: ``` Application Request ├── start_request_context() │ ├── Acquire slot from request_context_slots (max 2) │ ├── Get a new request_context_slots │ │ └── Initialize TaintedObjectMap (e.g., empty dict keyed by id(obj)) │ └── Register context into ApplicationContext.context_slots[n] │ ├── Propagation phase │ ├── get_tainted_object_map() → Retrieve current TaintedObjectMap from TaintEngineContext.request_context_slots │ ├── Check if object is tainted via id(obj) in map │ └── If needed, update/add TaintRanges to TaintedObject in TaintedObjectMap │ └── finish_request_context() ├── Clear TaintedObjectMap (remove all taint metadata for the request) └── Release slot in TaintEngineContext.request_context_slots (allow new request to reuse) ``` ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
This PR is part of the ongoing [IAST context refactor](#14466), and focuses on improving the **naming clarity** of internal structures and functions used within the IAST subsystem. ### 🔄 What’s changed? * Renamed several classes and functions to more accurately reflect their role in the taint tracking lifecycle. * Renamed TaintRangeMapType → TaintedObjectMapType. Consumers of this map retrieve a tainted object first, and then access the ranges inside it. This rename clarifies that the value is the tainted object as a whole, not just range metadata. * Renamed get_taint_map → get_tainted_object_map * Replaced previously confusing or ambiguous terms with consistent, descriptive alternatives (e.g., `contexts_array` → `request_context_slots`, `ApplicationContext` -> `TaintEngineContext`, etc.). ### 🧠 Why? The current naming was either overly generic or misleading, making the IAST implementation harder to understand and maintain. This refactor aligns terminology with: ``` Application Request ├── start_request_context() │ ├── Acquire slot from request_context_slots (max 2) │ ├── Get a new request_context_slots │ │ └── Initialize TaintedObjectMap (e.g., empty dict keyed by id(obj)) │ └── Register context into ApplicationContext.context_slots[n] │ ├── Propagation phase │ ├── get_tainted_object_map() → Retrieve current TaintedObjectMap from TaintEngineContext.request_context_slots │ ├── Check if object is tainted via id(obj) in map │ └── If needed, update/add TaintRanges to TaintedObject in TaintedObjectMap │ └── finish_request_context() ├── Clear TaintedObjectMap (remove all taint metadata for the request) └── Release slot in TaintEngineContext.request_context_slots (allow new request to reuse) ``` ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
## Overview It replaces the thread/process fragile map semantics with a bounded, request-scoped context backed by a native, fixed-size array, and adds end-to-end tests validating Overhead Control Engine (OCE) max-concurrency enforcement across FastAPI, Django, and Flask test apps. ## Context and motivation - Problem: The legacy IAST context map (pointer-keyed + thread-local cache) breaks under asyncio and concurrency, causing incorrect request association and performance inconsistencies. - Goal: Make the IAST request map safe and predictable under threads/async with consistent overhead, enforce request concurrency via OCE, and provide clear, framework-agnostic tests that verify behavior. ## Related work - Follows the direction and experiments from: - #14466: Initial context refactor and benchmarks groundwork - #14497, #14555, #14562: Subsequent iterations on IAST context ownership, perf and integration - This PR integrates and stabilizes those ideas into the runtime and test suites. ## What’s in this PR Core IAST context lifecycle - New request-scoped flow entrypoint - `ddtrace/appsec/_iast/_iast_request_context_base.py` - `_iast_start_request(span)` now gates with OCE (`oce.acquire_request(span)`) and creates a native request context only when needed. - Sets `IAST_CONTEXT` ContextVar to the active context id. - Attaches a per-request `IASTEnvironment(span)` via `core.set_item(IAST.REQUEST_CONTEXT_KEY, ...)`. - On finish (`_iast_finish_request(...)`) it updates global limits, discards the environment, and releases the native context. - Helper: `is_iast_request_enabled()` exposes whether the taint-tracking context is active in the current execution. ## Rollout plan - Land behind existing feature flags/env vars (default disabled for IAST if unsupported). - Monitor CI and integration tests for the new concurrency checks. - Follow-up PRs (tracked in the RFC) will: - Complete the native context array migration across all call sites. - Add more microbenchmarks and tighten perf budgets (+/- thresholds). - Expand sink coverage and per-request metrics. ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
Summary: This PR refactors IAST context management to be concurrency-correct across threads and asyncio tasks by switching to ContextVar-indexed context arrays, with diagnostic and benchmark tooling to validate behavior and performance.
Problem
Repro Gist shows the issue with the old initializer approach:
https://gist.github.com/avara1986/450731e7724d312e269a44ea58212bba
Output:
Solution Overview
IAST_CONTEXT
) storing an index (context_id) into a native-managed array of taint maps.Evidence: Concurrency Correctness
New context-array approach works correctly across threads and tasks:
https://gist.github.com/avara1986/c8566d077f92c4f3641c8c7285f4f6c6
Output:
Benchmarks
Note: direct get-by-id shows overhead vs thread-local, but real code requires context association anyway.
Follow-ups
Checklist
Reviewer Checklist