Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
cfbe7f9
[𝘀𝗽𝗿] changes to main this commit is based on
melver Sep 4, 2025
a5dccd9
[𝘀𝗽𝗿] initial version
melver Sep 4, 2025
7bd0cf0
[𝘀𝗽𝗿] changes introduced through rebase
melver Sep 4, 2025
131332f
rebase
melver Sep 4, 2025
0b4a9cf
[𝘀𝗽𝗿] changes introduced through rebase
melver Sep 5, 2025
f2ca41f
fixup! Switch to fixed MD
melver Sep 5, 2025
48227c8
fixup!
melver Sep 8, 2025
4687a18
[𝘀𝗽𝗿] changes introduced through rebase
melver Sep 8, 2025
b7f4c7f
rebase!
melver Sep 8, 2025
cfe2bcc
[𝘀𝗽𝗿] changes introduced through rebase
melver Sep 8, 2025
548eba5
fixup!
melver Sep 8, 2025
7f95796
[𝘀𝗽𝗿] changes introduced through rebase
melver Sep 18, 2025
3aa00d1
fixup! address reviewer comments
melver Sep 18, 2025
ebb789f
[𝘀𝗽𝗿] changes introduced through rebase
melver Sep 19, 2025
72f662e
fixup! address reviewer comments round 2
melver Sep 19, 2025
183d5ab
[𝘀𝗽𝗿] changes introduced through rebase
melver Sep 22, 2025
7a33b73
fixup! use update_test_checks.py for opt tests
melver Sep 22, 2025
a222dca
[𝘀𝗽𝗿] changes introduced through rebase
melver Sep 23, 2025
ba5bb6e
fixup! do not strip _
melver Sep 23, 2025
f1d10fe
[𝘀𝗽𝗿] changes introduced through rebase
melver Sep 26, 2025
3bad95e
fixup! address some comments
melver Sep 26, 2025
8af2f0d
[𝘀𝗽𝗿] changes introduced through rebase
melver Sep 26, 2025
5fb8572
fixup! address more comments
melver Sep 26, 2025
fb64a4b
[𝘀𝗽𝗿] changes introduced through rebase
melver Sep 29, 2025
80f47ce
rebase
melver Sep 29, 2025
5643a2e
[𝘀𝗽𝗿] changes introduced through rebase
melver Sep 30, 2025
5db2e34
fixup! address comments
melver Sep 30, 2025
2fb90fc
[𝘀𝗽𝗿] changes introduced through rebase
melver Oct 2, 2025
28f29d9
fixup!
melver Oct 2, 2025
16cd376
[𝘀𝗽𝗿] changes introduced through rebase
melver Oct 2, 2025
a09fc1e
rebase
melver Oct 2, 2025
d00d6f6
[𝘀𝗽𝗿] changes introduced through rebase
melver Oct 7, 2025
dacf09a
rebase
melver Oct 7, 2025
9d264b1
[𝘀𝗽𝗿] changes introduced through rebase
melver Oct 7, 2025
a2df6e7
rebase
melver Oct 7, 2025
1ffe1be
[𝘀𝗽𝗿] changes introduced through rebase
melver Oct 7, 2025
ee0275d
rebase
melver Oct 7, 2025
c2f16db
[𝘀𝗽𝗿] changes introduced through rebase
melver Oct 7, 2025
ab980ba
rebase
melver Oct 7, 2025
8bd82ad
[𝘀𝗽𝗿] changes introduced through rebase
melver Oct 7, 2025
e94eeee
rebase
melver Oct 7, 2025
4c24fd5
[𝘀𝗽𝗿] changes introduced through rebase
melver Oct 7, 2025
d3e693f
rebase
melver Oct 7, 2025
00806fe
[𝘀𝗽𝗿] changes introduced through rebase
melver Oct 7, 2025
00bf24a
rebase
melver Oct 7, 2025
7d39aa3
[𝘀𝗽𝗿] changes introduced through rebase
melver Oct 8, 2025
74ce4fe
rebase
melver Oct 8, 2025
d61b21f
[𝘀𝗽𝗿] changes introduced through rebase
melver Oct 8, 2025
f9987e0
rebase
melver Oct 8, 2025
f8c4717
[𝘀𝗽𝗿] changes introduced through rebase
melver Oct 9, 2025
c29f721
rebase
melver Oct 9, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 62 additions & 10 deletions clang/docs/AllocToken.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,39 @@ change or removal. These may (experimentally) be selected with ``-mllvm
* ``increment``: This mode assigns a simple, incrementally increasing token ID
to each allocation site.

The following command-line options affect generated token IDs:

* ``-falloc-token-max=<N>``
Configures the maximum number of tokens. No max by default (tokens bounded
by ``SIZE_MAX``).

Querying Token IDs with ``__builtin_infer_alloc_token``
=======================================================

For use cases where the token ID must be known at compile time, Clang provides
a builtin function:

.. code-block:: c

size_t __builtin_infer_alloc_token(<args>, ...);

This builtin returns the token ID inferred from its argument expressions, which
mirror arguments normally passed to any allocation function. The argument
expressions are **unevaluated**, so it can be used with expressions that would
have side effects without any runtime impact.

For example, it can be used as follows:

.. code-block:: c

struct MyType { ... };
void *__partition_alloc(size_t size, size_t partition);
#define partition_alloc(...) __partition_alloc(__VA_ARGS__, __builtin_infer_alloc_token(__VA_ARGS__))

void foo(void) {
MyType *x = partition_alloc(sizeof(*x));
}

Allocation Token Instrumentation
================================

Expand All @@ -70,16 +103,6 @@ example:
// Instrumented:
ptr = __alloc_token_malloc(size, <token id>);

The following command-line options affect generated token IDs:

* ``-falloc-token-max=<N>``
Configures the maximum number of tokens. No max by default (tokens bounded
by ``SIZE_MAX``).

.. code-block:: console

% clang++ -fsanitize=alloc-token -falloc-token-max=512 example.cc

Runtime Interface
-----------------

Expand Down Expand Up @@ -122,6 +145,35 @@ which encodes the token ID hint in the allocation function name.
This ABI provides a more efficient alternative where
``-falloc-token-max`` is small.

Instrumenting Non-Standard Allocation Functions
-----------------------------------------------

By default, AllocToken only instruments standard library allocation functions.
This simplifies adoption, as a compatible allocator only needs to provide
token-enabled variants for a well-defined set of standard functions.

To extend instrumentation to custom allocation functions, enable broader
coverage with ``-fsanitize-alloc-token-extended``. Such functions require being
marked with the `malloc
<https://clang.llvm.org/docs/AttributeReference.html#malloc>`_ or `alloc_size
<https://clang.llvm.org/docs/AttributeReference.html#alloc-size>`_ attributes
(or a combination).

For example:

.. code-block:: c

void *custom_malloc(size_t size) __attribute__((malloc));
void *my_malloc(size_t size) __attribute__((alloc_size(1)));

// Original:
ptr1 = custom_malloc(size);
ptr2 = my_malloc(size);

// Instrumented:
ptr1 = __alloc_token_custom_malloc(size, token_id);
ptr2 = __alloc_token_my_malloc(size, token_id);

Disabling Instrumentation
-------------------------

Expand Down
3 changes: 3 additions & 0 deletions clang/docs/ReleaseNotes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -261,6 +261,9 @@ Non-comprehensive list of changes in this release
allocator-level heap organization strategies. A feature to instrument all
allocation functions with a token ID can be enabled via the
``-fsanitize=alloc-token`` flag.
- A builtin ``__builtin_infer_alloc_token(<args>, ...)`` is provided to allow
compile-time querying of allocation token IDs, where the builtin arguments
mirror those normally passed to an allocation function.

New Compiler Flags
------------------
Expand Down
6 changes: 6 additions & 0 deletions clang/include/clang/Basic/Builtins.td
Original file line number Diff line number Diff line change
Expand Up @@ -1286,6 +1286,12 @@ def AllocaWithAlignUninitialized : Builtin {
let Prototype = "void*(size_t, _Constant size_t)";
}

def InferAllocToken : Builtin {
let Spellings = ["__builtin_infer_alloc_token"];
let Attributes = [NoThrow, Const, Pure, CustomTypeChecking, UnevaluatedArguments];
let Prototype = "size_t(...)";
}

def CallWithStaticChain : Builtin {
let Spellings = ["__builtin_call_with_static_chain"];
let Attributes = [NoThrow, CustomTypeChecking];
Expand Down
27 changes: 17 additions & 10 deletions clang/lib/CodeGen/BackendUtil.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -799,16 +799,6 @@ static void addSanitizers(const Triple &TargetTriple,
MPM.addPass(DataFlowSanitizerPass(LangOpts.NoSanitizeFiles,
PB.getVirtualFileSystemPtr()));
}

if (LangOpts.Sanitize.has(SanitizerKind::AllocToken)) {
if (Level == OptimizationLevel::O0) {
// The default pass builder only infers libcall function attrs when
// optimizing, so we insert it here because we need it for accurate
// memory allocation function detection.
MPM.addPass(InferFunctionAttrsPass());
}
MPM.addPass(AllocTokenPass(getAllocTokenOptions(CodeGenOpts)));
}
};
if (ClSanitizeOnOptimizerEarlyEP) {
PB.registerOptimizerEarlyEPCallback(
Expand Down Expand Up @@ -851,6 +841,22 @@ static void addSanitizers(const Triple &TargetTriple,
}
}

static void addAllocTokenPass(const Triple &TargetTriple,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather separate sema changes from codegen

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate llvm/.. from clang/.. changes?
Or clang/lib/CodeGen from others?

One is useless without the other, and at least this way, if there's a problem, we can atomically revert one commit instead of several or partially reverting one or several commits.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clang part looks deppends on LLVM but can be separated.
To avoid unlikely complex revert I'd just rather lang them with delay in a few days.

const CodeGenOptions &CodeGenOpts,
const LangOptions &LangOpts, PassBuilder &PB) {
PB.registerOptimizerLastEPCallback(
[&](ModulePassManager &MPM, OptimizationLevel Level, ThinOrFullLTOPhase) {
if (Level == OptimizationLevel::O0 &&
LangOpts.Sanitize.has(SanitizerKind::AllocToken)) {
// The default pass builder only infers libcall function attrs when
// optimizing, so we insert it here because we need it for accurate
// memory allocation function detection with -fsanitize=alloc-token.
MPM.addPass(InferFunctionAttrsPass());
}
MPM.addPass(AllocTokenPass(getAllocTokenOptions(CodeGenOpts)));
});
}

void EmitAssemblyHelper::RunOptimizationPipeline(
BackendAction Action, std::unique_ptr<raw_pwrite_stream> &OS,
std::unique_ptr<llvm::ToolOutputFile> &ThinLinkOS, BackendConsumer *BC) {
Expand Down Expand Up @@ -1105,6 +1111,7 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
if (!IsThinLTOPostLink) {
addSanitizers(TargetTriple, CodeGenOpts, LangOpts, PB);
addKCFIPass(TargetTriple, LangOpts, PB);
addAllocTokenPass(TargetTriple, CodeGenOpts, LangOpts, PB);
}

if (std::optional<GCOVOptions> Options =
Expand Down
9 changes: 9 additions & 0 deletions clang/lib/CodeGen/CGBuiltin.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4525,6 +4525,15 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
return RValue::get(AI);
}

case Builtin::BI__builtin_infer_alloc_token: {
llvm::MDNode *MDN = buildAllocToken(E);
llvm::Value *MDV = MetadataAsValue::get(getLLVMContext(), MDN);
llvm::Function *F =
CGM.getIntrinsic(llvm::Intrinsic::alloc_token_id, {IntPtrTy});
llvm::CallBase *TokenID = Builder.CreateCall(F, MDV);
return RValue::get(TokenID);
}

case Builtin::BIbzero:
case Builtin::BI__builtin_bzero: {
Address Dest = EmitPointerWithAlignment(E->getArg(0));
Expand Down
155 changes: 143 additions & 12 deletions clang/lib/CodeGen/CGExpr.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
#include "clang/AST/Attr.h"
#include "clang/AST/DeclObjC.h"
#include "clang/AST/NSAPI.h"
#include "clang/AST/ParentMapContext.h"
#include "clang/AST/StmtVisitor.h"
#include "clang/Basic/Builtins.h"
#include "clang/Basic/CodeGenOptions.h"
Expand Down Expand Up @@ -1321,10 +1322,7 @@ typeContainsPointer(QualType T,
return false;
}

void CodeGenFunction::EmitAllocToken(llvm::CallBase *CB, QualType AllocType) {
assert(SanOpts.has(SanitizerKind::AllocToken) &&
"Only needed with -fsanitize=alloc-token");

llvm::MDNode *CodeGenFunction::buildAllocToken(QualType AllocType) {
llvm::MDBuilder MDB(getLLVMContext());

// Get unique type name.
Expand All @@ -1343,14 +1341,136 @@ void CodeGenFunction::EmitAllocToken(llvm::CallBase *CB, QualType AllocType) {
const bool ContainsPtr =
typeContainsPointer(AllocType, VisitedRD, IncompleteType);
if (!ContainsPtr && IncompleteType)
return;
return nullptr;
auto *ContainsPtrC = Builder.getInt1(ContainsPtr);
auto *ContainsPtrMD = MDB.createConstant(ContainsPtrC);

// Format: !{<type-name>, <contains-pointer>}
auto *MDN =
llvm::MDNode::get(CGM.getLLVMContext(), {TypeNameMD, ContainsPtrMD});
CB->setMetadata(llvm::LLVMContext::MD_alloc_token, MDN);
return llvm::MDNode::get(CGM.getLLVMContext(), {TypeNameMD, ContainsPtrMD});
}

void CodeGenFunction::EmitAllocToken(llvm::CallBase *CB, QualType AllocType) {
assert(SanOpts.has(SanitizerKind::AllocToken) &&
"Only needed with -fsanitize=alloc-token");
CB->setMetadata(llvm::LLVMContext::MD_alloc_token,
buildAllocToken(AllocType));
}

namespace {
/// Infer type from a simple sizeof expression.
QualType inferTypeFromSizeofExpr(const Expr *E) {
const Expr *Arg = E->IgnoreParenImpCasts();
if (const auto *UET = dyn_cast<UnaryExprOrTypeTraitExpr>(Arg)) {
if (UET->getKind() == UETT_SizeOf) {
if (UET->isArgumentType())
return UET->getArgumentTypeInfo()->getType();
else
return UET->getArgumentExpr()->getType();
}
}
return QualType();
}

/// Infer type from an arithmetic expression involving a sizeof. For example:
///
/// malloc(sizeof(MyType) + padding); // infers 'MyType'
/// malloc(sizeof(MyType) * 32); // infers 'MyType'
/// malloc(32 * sizeof(MyType)); // infers 'MyType'
/// malloc(sizeof(MyType) << 1); // infers 'MyType'
/// ...
///
/// More complex arithmetic expressions are supported, but are a heuristic, e.g.
/// when considering allocations for structs with flexible array members:
///
/// malloc(sizeof(HasFlexArray) + sizeof(int) * 32); // infers 'HasFlexArray'
///
QualType inferPossibleTypeFromArithSizeofExpr(const Expr *E) {
const Expr *Arg = E->IgnoreParenImpCasts();
// The argument is a lone sizeof expression.
if (QualType T = inferTypeFromSizeofExpr(Arg); !T.isNull())
return T;
if (const auto *BO = dyn_cast<BinaryOperator>(Arg)) {
// Argument is an arithmetic expression. Cover common arithmetic patterns
// involving sizeof.
switch (BO->getOpcode()) {
case BO_Add:
case BO_Div:
case BO_Mul:
case BO_Shl:
case BO_Shr:
case BO_Sub:
if (QualType T = inferPossibleTypeFromArithSizeofExpr(BO->getLHS());
!T.isNull())
return T;
if (QualType T = inferPossibleTypeFromArithSizeofExpr(BO->getRHS());
!T.isNull())
return T;
break;
default:
break;
}
}
return QualType();
}

/// If the expression E is a reference to a variable, infer the type from a
/// variable's initializer if it contains a sizeof. Beware, this is a heuristic
/// and ignores if a variable is later reassigned. For example:
///
/// size_t my_size = sizeof(MyType);
/// void *x = malloc(my_size); // infers 'MyType'
///
QualType inferPossibleTypeFromVarInitSizeofExpr(const Expr *E) {
const Expr *Arg = E->IgnoreParenImpCasts();
if (const auto *DRE = dyn_cast<DeclRefExpr>(Arg)) {
if (const auto *VD = dyn_cast<VarDecl>(DRE->getDecl())) {
if (const Expr *Init = VD->getInit())
return inferPossibleTypeFromArithSizeofExpr(Init);
}
}
return QualType();
}

/// Deduces the allocated type by checking if the allocation call's result
/// is immediately used in a cast expression. For example:
///
/// MyType *x = (MyType *)malloc(4096); // infers 'MyType'
///
QualType inferPossibleTypeFromCastExpr(const CallExpr *CallE,
const CastExpr *CastE) {
if (!CastE)
return QualType();
QualType PtrType = CastE->getType();
if (PtrType->isPointerType())
return PtrType->getPointeeType();
return QualType();
}
} // end anonymous namespace

llvm::MDNode *CodeGenFunction::buildAllocToken(const CallExpr *E) {
QualType AllocType;
// First check arguments.
for (const Expr *Arg : E->arguments()) {
AllocType = inferPossibleTypeFromArithSizeofExpr(Arg);
if (AllocType.isNull())
AllocType = inferPossibleTypeFromVarInitSizeofExpr(Arg);
if (!AllocType.isNull())
break;
}
// Then check later casts.
if (AllocType.isNull())
AllocType = inferPossibleTypeFromCastExpr(E, CurCast);
// Emit if we were able to infer the type.
if (!AllocType.isNull())
return buildAllocToken(AllocType);
return nullptr;
}

void CodeGenFunction::EmitAllocToken(llvm::CallBase *CB, const CallExpr *E) {
assert(SanOpts.has(SanitizerKind::AllocToken) &&
"Only needed with -fsanitize=alloc-token");
if (llvm::MDNode *MDN = buildAllocToken(E))
CB->setMetadata(llvm::LLVMContext::MD_alloc_token, MDN);
}

CodeGenFunction::ComplexPairTy CodeGenFunction::
Expand Down Expand Up @@ -5723,6 +5843,9 @@ LValue CodeGenFunction::EmitConditionalOperatorLValue(
/// are permitted with aggregate result, including noop aggregate casts, and
/// cast from scalar to union.
LValue CodeGenFunction::EmitCastLValue(const CastExpr *E) {
auto RestoreCurCast =
llvm::make_scope_exit([this, Prev = CurCast] { CurCast = Prev; });
CurCast = E;
switch (E->getCastKind()) {
case CK_ToVoid:
case CK_BitCast:
Expand Down Expand Up @@ -6668,16 +6791,24 @@ RValue CodeGenFunction::EmitCall(QualType CalleeType,
RValue Call = EmitCall(FnInfo, Callee, ReturnValue, Args, &LocalCallOrInvoke,
E == MustTailCall, E->getExprLoc());

// Generate function declaration DISuprogram in order to be used
// in debug info about call sites.
if (CGDebugInfo *DI = getDebugInfo()) {
if (auto *CalleeDecl = dyn_cast_or_null<FunctionDecl>(TargetDecl)) {
if (auto *CalleeDecl = dyn_cast_or_null<FunctionDecl>(TargetDecl)) {
// Generate function declaration DISuprogram in order to be used
// in debug info about call sites.
if (CGDebugInfo *DI = getDebugInfo()) {
FunctionArgList Args;
QualType ResTy = BuildFunctionArgList(CalleeDecl, Args);
DI->EmitFuncDeclForCallSite(LocalCallOrInvoke,
DI->getFunctionType(CalleeDecl, ResTy, Args),
CalleeDecl);
}
if (CalleeDecl->hasAttr<RestrictAttr>() ||
CalleeDecl->hasAttr<AllocSizeAttr>()) {
// Function has 'malloc' (aka. 'restrict') or 'alloc_size' attribute.
if (SanOpts.has(SanitizerKind::AllocToken)) {
// Set !alloc_token metadata.
EmitAllocToken(LocalCallOrInvoke, E);
}
}
}
if (CallOrInvoke)
*CallOrInvoke = LocalCallOrInvoke;
Expand Down
12 changes: 10 additions & 2 deletions clang/lib/CodeGen/CGExprCXX.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1371,8 +1371,16 @@ RValue CodeGenFunction::EmitBuiltinNewDeleteCall(const FunctionProtoType *Type,

for (auto *Decl : Ctx.getTranslationUnitDecl()->lookup(Name))
if (auto *FD = dyn_cast<FunctionDecl>(Decl))
if (Ctx.hasSameType(FD->getType(), QualType(Type, 0)))
return EmitNewDeleteCall(*this, FD, Type, Args);
if (Ctx.hasSameType(FD->getType(), QualType(Type, 0))) {
RValue RV = EmitNewDeleteCall(*this, FD, Type, Args);
if (auto *CB = dyn_cast_if_present<llvm::CallBase>(RV.getScalarVal())) {
if (SanOpts.has(SanitizerKind::AllocToken)) {
// Set !alloc_token metadata.
EmitAllocToken(CB, TheCall);
}
}
return RV;
}
llvm_unreachable("predeclared global operator new/delete is missing");
}

Expand Down
Loading
Loading