Skip to content

Conversation

melver
Copy link
Contributor

@melver melver commented Oct 6, 2025

Introduce the "alloc-token" sanitizer kind, in preparation of wiring it
up. Currently this is a no-op, and any attempt to enable it will result
in failure:

clang: error: unsupported option '-fsanitize=alloc-token' for target 'x86_64-unknown-linux-gnu'

In this step we can already wire up the sanitize_alloc_token IR
attribute where the instrumentation is enabled. Subsequent changes will
complete wiring up the AllocToken pass.


This change is part of the following series:

  1. [AllocToken] Introduce sanitize_alloc_token attribute and alloc_token metadata #160131
  2. [AllocToken] Introduce AllocToken instrumentation pass #156838
  3. [Clang][CodeGen] Introduce the AllocToken SanitizerKind #162098
  4. [Clang][CodeGen] Emit !alloc_token for new expressions #162099
  5. [Clang] Wire up -fsanitize=alloc-token #156839
  6. [AllocToken, Clang] Implement TypeHashPointerSplit mode #156840
  7. [AllocToken, Clang] Infer type hints from sizeof expressions and casts #156841
  8. [AllocToken, Clang] Implement __builtin_infer_alloc_token() and llvm.alloc.token.id #156842

melver added 2 commits October 6, 2025 16:55
Created using spr 1.3.8-beta.1

[skip ci]
Created using spr 1.3.8-beta.1
@llvmbot
Copy link
Member

llvmbot commented Oct 6, 2025

@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: Marco Elver (melver)

Changes

Introduce the "alloc-token" sanitizer kind, in preparation of wiring it
up. Currently this is a no-op, and any attempt to enable it will result
in failure:

clang: error: unsupported option '-fsanitize=alloc-token' for target 'x86_64-unknown-linux-gnu'

In this step we can already wire up the sanitize_alloc_token IR
attribute where the instrumentation is enabled. Subsequent changes will
complete wiring up the AllocToken pass.


This change is part of the following series:

  1. [AllocToken] Introduce sanitize_alloc_token attribute and alloc_token metadata #160131
  2. [AllocToken] Introduce AllocToken instrumentation pass #156838
  3. [Clang][CodeGen] Introduce the AllocToken SanitizerKind #162098
  4. [Clang][CodeGen] Emit !alloc_token for new expressions #162099
  5. [Clang] Wire up -fsanitize=alloc-token #156839
  6. [AllocToken, Clang] Implement TypeHashPointerSplit mode #156840
  7. [AllocToken, Clang] Infer type hints from sizeof expressions and casts #156841
  8. [AllocToken, Clang] Implement __builtin_infer_alloc_token() and llvm.alloc.token.id #156842

Full diff: https://github.com/llvm/llvm-project/pull/162098.diff

2 Files Affected:

  • (modified) clang/include/clang/Basic/Sanitizers.def (+3)
  • (modified) clang/lib/CodeGen/CodeGenFunction.cpp (+2)
diff --git a/clang/include/clang/Basic/Sanitizers.def b/clang/include/clang/Basic/Sanitizers.def
index 1d0e97cc7fb4c..da85431625026 100644
--- a/clang/include/clang/Basic/Sanitizers.def
+++ b/clang/include/clang/Basic/Sanitizers.def
@@ -195,6 +195,9 @@ SANITIZER_GROUP("bounds", Bounds, ArrayBounds | LocalBounds)
 // Scudo hardened allocator
 SANITIZER("scudo", Scudo)
 
+// AllocToken
+SANITIZER("alloc-token", AllocToken)
+
 // Magic group, containing all sanitizers. For example, "-fno-sanitize=all"
 // can be used to disable all the sanitizers.
 SANITIZER_GROUP("all", All, ~SanitizerMask())
diff --git a/clang/lib/CodeGen/CodeGenFunction.cpp b/clang/lib/CodeGen/CodeGenFunction.cpp
index b2fe9171372d8..acf8de4dee147 100644
--- a/clang/lib/CodeGen/CodeGenFunction.cpp
+++ b/clang/lib/CodeGen/CodeGenFunction.cpp
@@ -846,6 +846,8 @@ void CodeGenFunction::StartFunction(GlobalDecl GD, QualType RetTy,
       Fn->addFnAttr(llvm::Attribute::SanitizeNumericalStability);
     if (SanOpts.hasOneOf(SanitizerKind::Memory | SanitizerKind::KernelMemory))
       Fn->addFnAttr(llvm::Attribute::SanitizeMemory);
+    if (SanOpts.has(SanitizerKind::AllocToken))
+      Fn->addFnAttr(llvm::Attribute::SanitizeAllocToken);
   }
   if (SanOpts.has(SanitizerKind::SafeStack))
     Fn->addFnAttr(llvm::Attribute::SafeStack);

melver added a commit to melver/llvm-project that referenced this pull request Oct 7, 2025
Introduce the "alloc-token" sanitizer kind, in preparation of wiring it
up. Currently this is a no-op, and any attempt to enable it will result
in failure:

  clang: error: unsupported option '-fsanitize=alloc-token' for target 'x86_64-unknown-linux-gnu'

In this step we can already wire up the `sanitize_alloc_token` IR
attribute where the instrumentation is enabled. Subsequent changes will
complete wiring up the AllocToken pass.

Pull Request: llvm#162098
melver added 4 commits October 7, 2025 11:53
Created using spr 1.3.8-beta.1

[skip ci]
Created using spr 1.3.8-beta.1
Created using spr 1.3.8-beta.1

[skip ci]
Created using spr 1.3.8-beta.1
melver added a commit to melver/llvm-project that referenced this pull request Oct 7, 2025
Introduce the "alloc-token" sanitizer kind, in preparation of wiring it
up. Currently this is a no-op, and any attempt to enable it will result
in failure:

  clang: error: unsupported option '-fsanitize=alloc-token' for target 'x86_64-unknown-linux-gnu'

In this step we can already wire up the `sanitize_alloc_token` IR
attribute where the instrumentation is enabled. Subsequent changes will
complete wiring up the AllocToken pass.

Pull Request: llvm#162098
melver added a commit that referenced this pull request Oct 7, 2025
… metadata (#160131)

In preparation of adding the "AllocToken" pass, add the pre-requisite
`sanitize_alloc_token` function attribute and `alloc_token` metadata.

---

This change is part of the following series:
  1. #160131
  2. #156838
  3. #162098
  4. #162099
  5. #156839
  6. #156840
  7. #156841
  8. #156842
melver added 2 commits October 7, 2025 12:56
Created using spr 1.3.8-beta.1

[skip ci]
Created using spr 1.3.8-beta.1
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Oct 7, 2025
…alloc_token metadata (#160131)

In preparation of adding the "AllocToken" pass, add the pre-requisite
`sanitize_alloc_token` function attribute and `alloc_token` metadata.

---

This change is part of the following series:
  1. llvm/llvm-project#160131
  2. llvm/llvm-project#156838
  3. llvm/llvm-project#162098
  4. llvm/llvm-project#162099
  5. llvm/llvm-project#156839
  6. llvm/llvm-project#156840
  7. llvm/llvm-project#156841
  8. llvm/llvm-project#156842
melver added a commit that referenced this pull request Oct 7, 2025
Introduce `AllocToken`, an instrumentation pass designed to provide
tokens to memory allocators enabling various heap organization
strategies, such as heap partitioning.

Initially, the pass instruments functions marked with a new attribute
`sanitize_alloc_token` by rewriting allocation calls to include a token
ID, appended as a function argument with the default ABI.

The design aims to provide a flexible framework for implementing
different token generation schemes. It currently supports the following
token modes:

- TypeHash (default): token IDs based on a hash of the allocated type
- Random: statically-assigned pseudo-random token IDs
- Increment: incrementing token IDs per TU

For the `TypeHash` mode introduce support for `!alloc_token` metadata:
the metadata can be attached to allocation calls to provide richer
semantic
information to be consumed by the AllocToken pass. Optimization remarks
can be enabled to show where no metadata was available.

An alternative "fast ABI" is provided, where instead of passing the
token ID as an argument (e.g., `__alloc_token_malloc(size, id)`), the
token ID is directly encoded into the name of the called function (e.g.,
`__alloc_token_0_malloc(size)`). Where the maximum tokens is small, this
offers more efficient instrumentation by avoiding the overhead of
passing an additional argument at each allocation site.

Link: https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434 [1]

---

This change is part of the following series:
  1. #160131
  2. #156838
  3. #162098
  4. #162099
  5. #156839
  6. #156840
  7. #156841
  8. #156842
@melver melver changed the base branch from users/melver/spr/main.clangcodegen-introduce-the-alloctoken-sanitizerkind to main October 7, 2025 11:30
thurstond added a commit that referenced this pull request Oct 8, 2025
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Oct 8, 2025
…ions" (#162412)

Reverts llvm/llvm-project#162099

Reason: this commit depends on #162098, which I am reverting due to
build breakage (see
llvm/llvm-project#162098 (comment)).
thurstond added a commit that referenced this pull request Oct 8, 2025
@thurstond
Copy link
Contributor

Due to the time zone difference, I've gone ahead and reverted this patch and its dependent patch (#162099)

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Oct 8, 2025
@melver
Copy link
Contributor Author

melver commented Oct 8, 2025

Due to the time zone difference, I've gone ahead and reverted this patch and its dependent patch (#162099)

Sigh, this is a brittle test, or rather an unfortunate side-effect of incrementally building & testing on a CI where the test outputs are not cleared. I was able to reproduce this when I checked out 93f2e0a, then checked out this change, and retested with the zorg scripts. Then, if I run:

rm -rf zorg-test/llvm_build_asan_ubsan/tools/clang/test/Preprocessor/Output/print-header-json.c.tmp/

And rerun the tests, the tests pass:

../zorg-test/llvm_build_asan_ubsan/bin/llvm-lit -v clang/test/Preprocessor/print-header-json.c
-- Testing: 1 tests, 1 workers --
PASS: Clang :: Preprocessor/print-header-json.c (1 of 1)

Testing Time: 3.57s

Total Discovered Tests: 1
  Passed: 1 (100.00%)

We could try to fix the test to clear the cache dir or fix the test scripts. I suspect fixing the test is the better option, because everyone who does incremental build + test will have this problem.

Summary of the problem is this: After a patch (such as one adding new sanitizer kind) that changes the binary format of PCMs (because they track codegen options), reusing a stale cached PCM is no longer binary-compatible. Here, adding a new sanitizer option altered the implicit binary layout of the serialized LangOptions. The build & test system is oblivious to this. When the new compiler attempted to read the old module file, it misinterpreted the data due to the layout mismatch, resulting in a heap-buffer-overflow.

TLDR; Clang's PCM binary format doesn't encode a version and attempting to load version-incompatible PCMs from previous test invocations after an implicit change results in a heap buffer overflow and assorted failures.

melver added a commit that referenced this pull request Oct 8, 2025
[ Reland after 7815df1 ("[Clang] Fix brittle print-header-json.c test") ]

Introduce the "alloc-token" sanitizer kind, in preparation of wiring it
up. Currently this is a no-op, and any attempt to enable it will result
in failure:

clang: error: unsupported option '-fsanitize=alloc-token' for target
'x86_64-unknown-linux-gnu'

In this step we can already wire up the `sanitize_alloc_token` IR
attribute where the instrumentation is enabled. Subsequent changes will
complete wiring up the AllocToken pass.

---

This change is part of the following series:
  1. #160131
  2. #156838
  3. #162098
  4. #162099
  5. #156839
  6. #156840
  7. #156841
  8. #156842
melver added a commit that referenced this pull request Oct 8, 2025
[ Reland after 7815df1 ("[Clang] Fix brittle print-header-json.c test") ]

For new expressions, the allocated type is syntactically known and we
can trivially emit the !alloc_token metadata. A subsequent change will
wire up the AllocToken pass and introduce appropriate tests.

---

This change is part of the following series:
  1. #160131
  2. #156838
  3. #162098
  4. #162099
  5. #156839
  6. #156840
  7. #156841
  8. #156842
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Oct 8, 2025
…162098)

[ Reland after 7815df1 ("[Clang] Fix brittle print-header-json.c test") ]

Introduce the "alloc-token" sanitizer kind, in preparation of wiring it
up. Currently this is a no-op, and any attempt to enable it will result
in failure:

clang: error: unsupported option '-fsanitize=alloc-token' for target
'x86_64-unknown-linux-gnu'

In this step we can already wire up the `sanitize_alloc_token` IR
attribute where the instrumentation is enabled. Subsequent changes will
complete wiring up the AllocToken pass.

---

This change is part of the following series:
  1. llvm/llvm-project#160131
  2. llvm/llvm-project#156838
  3. llvm/llvm-project#162098
  4. llvm/llvm-project#162099
  5. llvm/llvm-project#156839
  6. llvm/llvm-project#156840
  7. llvm/llvm-project#156841
  8. llvm/llvm-project#156842
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Oct 8, 2025
…62099)

[ Reland after 7815df1 ("[Clang] Fix brittle print-header-json.c test") ]

For new expressions, the allocated type is syntactically known and we
can trivially emit the !alloc_token metadata. A subsequent change will
wire up the AllocToken pass and introduce appropriate tests.

---

This change is part of the following series:
  1. llvm/llvm-project#160131
  2. llvm/llvm-project#156838
  3. llvm/llvm-project#162098
  4. llvm/llvm-project#162099
  5. llvm/llvm-project#156839
  6. llvm/llvm-project#156840
  7. llvm/llvm-project#156841
  8. llvm/llvm-project#156842
@thurstond
Copy link
Contributor

I see, thank you @melver!

melver added a commit that referenced this pull request Oct 8, 2025
Wire up the `-fsanitize=alloc-token` command-line option, hooking up
the `AllocToken` pass -- it provides allocation tokens to compatible
runtime allocators, enabling different heap organization strategies,
e.g. hardening schemes based on heap partitioning.

The instrumentation rewrites standard allocation calls into variants
that accept an additional `size_t token_id` argument. For example,
calls to `malloc(size)` become `__alloc_token_malloc(size, token_id)`,
and a C++ `new MyType` expression will call
`__alloc_token__Znwm(size, token_id)`.

Currently untyped allocation calls do not yet have `!alloc_token`
metadata, and therefore receive the fallback token only. This will be
fixed in subsequent changes through best-effort type-inference.

One benefit of the instrumentation approach is that it can be applied
transparently to large codebases, and scales in deployment as other
sanitizers.

Similarly to other sanitizers, instrumentation can selectively be
controlled using `__attribute__((no_sanitize("alloc-token")))`. Support
for sanitizer ignorelists to disable instrumentation for specific
functions or source files is implemented.

See clang/docs/AllocToken.rst for more usage instructions.

Link:
https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434

---

This change is part of the following series:
  1. #160131
  2. #156838
  3. #162098
  4. #162099
  5. #156839
  6. #156840
  7. #156841
  8. #156842
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Oct 8, 2025
Wire up the `-fsanitize=alloc-token` command-line option, hooking up
the `AllocToken` pass -- it provides allocation tokens to compatible
runtime allocators, enabling different heap organization strategies,
e.g. hardening schemes based on heap partitioning.

The instrumentation rewrites standard allocation calls into variants
that accept an additional `size_t token_id` argument. For example,
calls to `malloc(size)` become `__alloc_token_malloc(size, token_id)`,
and a C++ `new MyType` expression will call
`__alloc_token__Znwm(size, token_id)`.

Currently untyped allocation calls do not yet have `!alloc_token`
metadata, and therefore receive the fallback token only. This will be
fixed in subsequent changes through best-effort type-inference.

One benefit of the instrumentation approach is that it can be applied
transparently to large codebases, and scales in deployment as other
sanitizers.

Similarly to other sanitizers, instrumentation can selectively be
controlled using `__attribute__((no_sanitize("alloc-token")))`. Support
for sanitizer ignorelists to disable instrumentation for specific
functions or source files is implemented.

See clang/docs/AllocToken.rst for more usage instructions.

Link:
https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434

---

This change is part of the following series:
  1. llvm/llvm-project#160131
  2. llvm/llvm-project#156838
  3. llvm/llvm-project#162098
  4. llvm/llvm-project#162099
  5. llvm/llvm-project#156839
  6. llvm/llvm-project#156840
  7. llvm/llvm-project#156841
  8. llvm/llvm-project#156842
melver added a commit that referenced this pull request Oct 8, 2025
Implement the TypeHashPointerSplit mode: This mode assigns a token ID
based on the hash of the allocated type's name, where the top half
ID-space is reserved for types that contain pointers and the bottom half
for types that do not contain pointers.

This mode with max tokens of 2 (`-falloc-token-max=2`) may also
be valuable for heap hardening strategies that simply separate pointer
types from non-pointer types.

Make it the new default mode.

Link: https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434

---

This change is part of the following series:
  1. #160131
  2. #156838
  3. #162098
  4. #162099
  5. #156839
  6. #156840
  7. #156841
  8. #156842
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Oct 8, 2025
…156840)

Implement the TypeHashPointerSplit mode: This mode assigns a token ID
based on the hash of the allocated type's name, where the top half
ID-space is reserved for types that contain pointers and the bottom half
for types that do not contain pointers.

This mode with max tokens of 2 (`-falloc-token-max=2`) may also
be valuable for heap hardening strategies that simply separate pointer
types from non-pointer types.

Make it the new default mode.

Link: https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434

---

This change is part of the following series:
  1. llvm/llvm-project#160131
  2. llvm/llvm-project#156838
  3. llvm/llvm-project#162098
  4. llvm/llvm-project#162099
  5. llvm/llvm-project#156839
  6. llvm/llvm-project#156840
  7. llvm/llvm-project#156841
  8. llvm/llvm-project#156842
melver added a commit that referenced this pull request Oct 9, 2025
#156841)

For the AllocToken pass to accurately calculate token ID hints, we
need to attach `!alloc_token` metadata for allocation calls.

Unlike new expressions, untyped allocation calls (like `malloc`,
`calloc`, `::operator new(..)`, `__builtin_operator_new`, etc.) have no
syntactic type associated with them. For -fsanitize=alloc-token, type
hints are sufficient, and we can attempt to infer the type based on
common idioms.

When encountering allocation calls (with `__attribute__((malloc))` or
`__attribute__((alloc_size(..))`), attach `!alloc_token` by inferring
the allocated type from (a) sizeof argument expressions such as
`malloc(sizeof(MyType))`, and (b) casts such as `(MyType*)malloc(4096)`.

Note that non-standard allocation functions with these attributes are
not instrumented by default. Use `-fsanitize-alloc-token-extended` to
instrument them as well.

Link: https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434

---

This change is part of the following series:
  1. #160131
  2. #156838
  3. #162098
  4. #162099
  5. #156839
  6. #156840
  7. #156841
  8. #156842
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Oct 9, 2025
…ns and casts (#156841)

For the AllocToken pass to accurately calculate token ID hints, we
need to attach `!alloc_token` metadata for allocation calls.

Unlike new expressions, untyped allocation calls (like `malloc`,
`calloc`, `::operator new(..)`, `__builtin_operator_new`, etc.) have no
syntactic type associated with them. For -fsanitize=alloc-token, type
hints are sufficient, and we can attempt to infer the type based on
common idioms.

When encountering allocation calls (with `__attribute__((malloc))` or
`__attribute__((alloc_size(..))`), attach `!alloc_token` by inferring
the allocated type from (a) sizeof argument expressions such as
`malloc(sizeof(MyType))`, and (b) casts such as `(MyType*)malloc(4096)`.

Note that non-standard allocation functions with these attributes are
not instrumented by default. Use `-fsanitize-alloc-token-extended` to
instrument them as well.

Link: https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434

---

This change is part of the following series:
  1. llvm/llvm-project#160131
  2. llvm/llvm-project#156838
  3. llvm/llvm-project#162098
  4. llvm/llvm-project#162099
  5. llvm/llvm-project#156839
  6. llvm/llvm-project#156840
  7. llvm/llvm-project#156841
  8. llvm/llvm-project#156842
svkeerthy pushed a commit that referenced this pull request Oct 9, 2025
)

Reverts #162099

Reason: this commit depends on #162098, which I am reverting due to
build breakage (see
#162098 (comment)).
svkeerthy pushed a commit that referenced this pull request Oct 9, 2025
svkeerthy pushed a commit that referenced this pull request Oct 9, 2025
[ Reland after 7815df1 ("[Clang] Fix brittle print-header-json.c test") ]

Introduce the "alloc-token" sanitizer kind, in preparation of wiring it
up. Currently this is a no-op, and any attempt to enable it will result
in failure:

clang: error: unsupported option '-fsanitize=alloc-token' for target
'x86_64-unknown-linux-gnu'

In this step we can already wire up the `sanitize_alloc_token` IR
attribute where the instrumentation is enabled. Subsequent changes will
complete wiring up the AllocToken pass.

---

This change is part of the following series:
  1. #160131
  2. #156838
  3. #162098
  4. #162099
  5. #156839
  6. #156840
  7. #156841
  8. #156842
svkeerthy pushed a commit that referenced this pull request Oct 9, 2025
[ Reland after 7815df1 ("[Clang] Fix brittle print-header-json.c test") ]

For new expressions, the allocated type is syntactically known and we
can trivially emit the !alloc_token metadata. A subsequent change will
wire up the AllocToken pass and introduce appropriate tests.

---

This change is part of the following series:
  1. #160131
  2. #156838
  3. #162098
  4. #162099
  5. #156839
  6. #156840
  7. #156841
  8. #156842
svkeerthy pushed a commit that referenced this pull request Oct 9, 2025
Wire up the `-fsanitize=alloc-token` command-line option, hooking up
the `AllocToken` pass -- it provides allocation tokens to compatible
runtime allocators, enabling different heap organization strategies,
e.g. hardening schemes based on heap partitioning.

The instrumentation rewrites standard allocation calls into variants
that accept an additional `size_t token_id` argument. For example,
calls to `malloc(size)` become `__alloc_token_malloc(size, token_id)`,
and a C++ `new MyType` expression will call
`__alloc_token__Znwm(size, token_id)`.

Currently untyped allocation calls do not yet have `!alloc_token`
metadata, and therefore receive the fallback token only. This will be
fixed in subsequent changes through best-effort type-inference.

One benefit of the instrumentation approach is that it can be applied
transparently to large codebases, and scales in deployment as other
sanitizers.

Similarly to other sanitizers, instrumentation can selectively be
controlled using `__attribute__((no_sanitize("alloc-token")))`. Support
for sanitizer ignorelists to disable instrumentation for specific
functions or source files is implemented.

See clang/docs/AllocToken.rst for more usage instructions.

Link:
https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434

---

This change is part of the following series:
  1. #160131
  2. #156838
  3. #162098
  4. #162099
  5. #156839
  6. #156840
  7. #156841
  8. #156842
svkeerthy pushed a commit that referenced this pull request Oct 9, 2025
Implement the TypeHashPointerSplit mode: This mode assigns a token ID
based on the hash of the allocated type's name, where the top half
ID-space is reserved for types that contain pointers and the bottom half
for types that do not contain pointers.

This mode with max tokens of 2 (`-falloc-token-max=2`) may also
be valuable for heap hardening strategies that simply separate pointer
types from non-pointer types.

Make it the new default mode.

Link: https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434

---

This change is part of the following series:
  1. #160131
  2. #156838
  3. #162098
  4. #162099
  5. #156839
  6. #156840
  7. #156841
  8. #156842
svkeerthy pushed a commit that referenced this pull request Oct 9, 2025
#156841)

For the AllocToken pass to accurately calculate token ID hints, we
need to attach `!alloc_token` metadata for allocation calls.

Unlike new expressions, untyped allocation calls (like `malloc`,
`calloc`, `::operator new(..)`, `__builtin_operator_new`, etc.) have no
syntactic type associated with them. For -fsanitize=alloc-token, type
hints are sufficient, and we can attempt to infer the type based on
common idioms.

When encountering allocation calls (with `__attribute__((malloc))` or
`__attribute__((alloc_size(..))`), attach `!alloc_token` by inferring
the allocated type from (a) sizeof argument expressions such as
`malloc(sizeof(MyType))`, and (b) casts such as `(MyType*)malloc(4096)`.

Note that non-standard allocation functions with these attributes are
not instrumented by default. Use `-fsanitize-alloc-token-extended` to
instrument them as well.

Link: https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434

---

This change is part of the following series:
  1. #160131
  2. #156838
  3. #162098
  4. #162099
  5. #156839
  6. #156840
  7. #156841
  8. #156842
clingfei pushed a commit to clingfei/llvm-project that referenced this pull request Oct 10, 2025
[ Reland after 7815df1 ("[Clang] Fix brittle print-header-json.c test") ]

Introduce the "alloc-token" sanitizer kind, in preparation of wiring it
up. Currently this is a no-op, and any attempt to enable it will result
in failure:

clang: error: unsupported option '-fsanitize=alloc-token' for target
'x86_64-unknown-linux-gnu'

In this step we can already wire up the `sanitize_alloc_token` IR
attribute where the instrumentation is enabled. Subsequent changes will
complete wiring up the AllocToken pass.

---

This change is part of the following series:
  1. llvm#160131
  2. llvm#156838
  3. llvm#162098
  4. llvm#162099
  5. llvm#156839
  6. llvm#156840
  7. llvm#156841
  8. llvm#156842
clingfei pushed a commit to clingfei/llvm-project that referenced this pull request Oct 10, 2025
[ Reland after 7815df1 ("[Clang] Fix brittle print-header-json.c test") ]

For new expressions, the allocated type is syntactically known and we
can trivially emit the !alloc_token metadata. A subsequent change will
wire up the AllocToken pass and introduce appropriate tests.

---

This change is part of the following series:
  1. llvm#160131
  2. llvm#156838
  3. llvm#162098
  4. llvm#162099
  5. llvm#156839
  6. llvm#156840
  7. llvm#156841
  8. llvm#156842
clingfei pushed a commit to clingfei/llvm-project that referenced this pull request Oct 10, 2025
Wire up the `-fsanitize=alloc-token` command-line option, hooking up
the `AllocToken` pass -- it provides allocation tokens to compatible
runtime allocators, enabling different heap organization strategies,
e.g. hardening schemes based on heap partitioning.

The instrumentation rewrites standard allocation calls into variants
that accept an additional `size_t token_id` argument. For example,
calls to `malloc(size)` become `__alloc_token_malloc(size, token_id)`,
and a C++ `new MyType` expression will call
`__alloc_token__Znwm(size, token_id)`.

Currently untyped allocation calls do not yet have `!alloc_token`
metadata, and therefore receive the fallback token only. This will be
fixed in subsequent changes through best-effort type-inference.

One benefit of the instrumentation approach is that it can be applied
transparently to large codebases, and scales in deployment as other
sanitizers.

Similarly to other sanitizers, instrumentation can selectively be
controlled using `__attribute__((no_sanitize("alloc-token")))`. Support
for sanitizer ignorelists to disable instrumentation for specific
functions or source files is implemented.

See clang/docs/AllocToken.rst for more usage instructions.

Link:
https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434

---

This change is part of the following series:
  1. llvm#160131
  2. llvm#156838
  3. llvm#162098
  4. llvm#162099
  5. llvm#156839
  6. llvm#156840
  7. llvm#156841
  8. llvm#156842
clingfei pushed a commit to clingfei/llvm-project that referenced this pull request Oct 10, 2025
Implement the TypeHashPointerSplit mode: This mode assigns a token ID
based on the hash of the allocated type's name, where the top half
ID-space is reserved for types that contain pointers and the bottom half
for types that do not contain pointers.

This mode with max tokens of 2 (`-falloc-token-max=2`) may also
be valuable for heap hardening strategies that simply separate pointer
types from non-pointer types.

Make it the new default mode.

Link: https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434

---

This change is part of the following series:
  1. llvm#160131
  2. llvm#156838
  3. llvm#162098
  4. llvm#162099
  5. llvm#156839
  6. llvm#156840
  7. llvm#156841
  8. llvm#156842
clingfei pushed a commit to clingfei/llvm-project that referenced this pull request Oct 10, 2025
llvm#156841)

For the AllocToken pass to accurately calculate token ID hints, we
need to attach `!alloc_token` metadata for allocation calls.

Unlike new expressions, untyped allocation calls (like `malloc`,
`calloc`, `::operator new(..)`, `__builtin_operator_new`, etc.) have no
syntactic type associated with them. For -fsanitize=alloc-token, type
hints are sufficient, and we can attempt to infer the type based on
common idioms.

When encountering allocation calls (with `__attribute__((malloc))` or
`__attribute__((alloc_size(..))`), attach `!alloc_token` by inferring
the allocated type from (a) sizeof argument expressions such as
`malloc(sizeof(MyType))`, and (b) casts such as `(MyType*)malloc(4096)`.

Note that non-standard allocation functions with these attributes are
not instrumented by default. Use `-fsanitize-alloc-token-extended` to
instrument them as well.

Link: https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434

---

This change is part of the following series:
  1. llvm#160131
  2. llvm#156838
  3. llvm#162098
  4. llvm#162099
  5. llvm#156839
  6. llvm#156840
  7. llvm#156841
  8. llvm#156842
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants