Skip to content

Conversation

@shiltian
Copy link
Contributor

@shiltian shiltian commented Nov 5, 2025

Motivation

The value of a nullptr is not always 0. For example, on AMDGPU, the nullptr in address spaces 3 and 5 is 0xffffffff. Currently, there is no target-independent way to get this information, making it difficult and error-prone to handle null pointers in target-agnostic code.

We do have ConstantPointerNull, but it might be a little confusing and misleading. It represents a pointer with an all-zero value rather than necessarily a real nullptr. Therefore, to represent a real nullptr in address space N, we need to use addrspacecast ptr null to ptr addrspace(N) and it can't be folded.

Proposal

In this PR, we change the semantic of ConstantPointerNull to represent an actual nullptr instead of a zero-value pointer. Here is the detailed changes.

IR

  • ptr addrspace(N) null will represent the actual nullptr in address space N.
  • ptr addrspace(N) zeroinitializer will represent a zero-value pointer in address space N.

API

  • Constant::getNullValue will return a null value. It is same as the current semantics except for the PointerType, which will return a real nullptr pointer.
  • Constant::getZeroValue will return a zero value constant. It is completely same as the current semantics. To represent a zero-value pointer, a ConstantExpr will be used (effectively inttoptr i8 0 to ptr addrspace(N)).
  • Correspondingly, there will be both Constant::isNullValue and Constant::isZeroValue.

The RFC is https://discourse.llvm.org/t/rfc-introduce-sentinel-pointer-value-to-datalayout/85265. It is a little bit old and the title might look different, but everything eventually converges to this change. An early attempt can be found in #131557, which has many valuable discussion as well.

This PR is still WIP but any early feedback is welcome. I'll include as many necessary code changes as possible in this PR, but eventually this needs to be carefully split into multiple PRs, and I'll do it after the changes look good to every one.

Copy link
Contributor Author

shiltian commented Nov 5, 2025

@shiltian shiltian force-pushed the users/shiltian/ptr-zeroinitializer-to-inttoptr branch 3 times, most recently from 27a981c to a3f0889 Compare November 6, 2025 03:46
@shiltian shiltian changed the title [WIP][IR][Constants] Use ptr addrspace(N) zeroinitializer to represent a zero-value pointer [WIP][IR][Constants] Change the semantic of ConstantPointerNull to represent an actual nullptr instead of a zero-value pointer Nov 6, 2025
@shiltian shiltian force-pushed the users/shiltian/ptr-zeroinitializer-to-inttoptr branch from a3f0889 to fab5565 Compare November 6, 2025 18:49

// Default pointer type specifications.
constexpr DataLayout::PointerSpec DefaultPointerSpecs[] = {
const DataLayout::PointerSpec DefaultPointerSpecs[] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could avoid the need for a global ctor/dtor if we just stored a plain integer value? Since we only support 0/-1 we could just rely on sign-extension? But of course that makes the getter slower so probably best to ignore this suggestion.

Copy link
Contributor Author

@shiltian shiltian Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, but we might want to allow a target to specify that the nullptr is an arbitrary bit pattern to be more future proof such that LLVM will not try any folding regarding those casts involving nullptr. I added 'c' as the flag to specify that.

Well, after a second thought, maybe we can store a random value (not 0/-1) to indicate it is custom?

@shiltian shiltian force-pushed the users/shiltian/ptr-zeroinitializer-to-inttoptr branch 3 times, most recently from a08a000 to 9478a0d Compare November 6, 2025 21:34
@shiltian shiltian force-pushed the users/shiltian/ptr-zeroinitializer-to-inttoptr branch 2 times, most recently from a886714 to c47d8e5 Compare November 6, 2025 22:05
/*IsSigned=*/false, DL);
} else if (auto *GEP = dyn_cast<GEPOperator>(CE)) {
// If we have GEP, we can perform the following folds:
// (ptrtoint/ptrtoaddr (gep null, x)) -> x
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this fold now need to check for Base->isZero() or use the null pointer APInt value as the base?

auto *Base = cast<Constant>(GEP->stripAndAccumulateConstantOffsets(
DL, BaseOffset, /*AllowNonInbounds=*/true));
if (Base->isNullValue()) {
FoldedValue = ConstantInt::get(CE->getContext(), BaseOffset);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
FoldedValue = ConstantInt::get(CE->getContext(), NullPtrValue + BaseOffset);

plus appropriate handling of zero-extension

@shiltian shiltian force-pushed the users/shiltian/ptr-zeroinitializer-to-inttoptr branch from c47d8e5 to b9669ba Compare November 7, 2025 04:56
; CHECK-NEXT: ALU clause starting at 4:
; CHECK-NEXT: MOV * T0.X, literal.x,
; CHECK-NEXT: -1(nan), 0(0.000000e+00)
; CHECK-NEXT: 0(0.000000e+00), 0(0.000000e+00)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does R600 even have multiple address spaces? It doesn't look like the case from the data layout string.

; CHECK-NEXT: call void @__asan_report_store8(i64 0) #[[ATTR4:[0-9]+]]
; CHECK-NEXT: br i1 [[TMP1]], label %[[BB5:.*]], label %[[BB6:.*]]
; CHECK: [[BB5]]:
; CHECK-NEXT: call void @__asan_report_store8(i64 ptrtoint (ptr null to i64)) #[[ATTR4:[0-9]+]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should really be passing a pointer to these hooks rather than an integer value, but also am surprised this isn't using the datalayout aware constant folding.

; CHECK-SAME: (ptr [[A:%.*]]) #[[ATTR0:[0-9]+]] {
; CHECK-NEXT: entry:
; CHECK-NEXT: [[DOTHWASAN_SHADOW:%.*]] = call ptr asm "", "=r,0"(ptr null)
; CHECK-NEXT: [[DOTHWASAN_SHADOW:%.*]] = call ptr asm "", "=r,0"(ptr zeroinitializer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this ptr inttoptr 0 coming from? Shouldn't it be using ConstantPointerNull? That would reduce the diff quite a lot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm looking at HWAddressSanitizer::getShadowNonTls this does actually need to be an integer converted to a pointer, so it needs to be zero even if null pointers are all ones. The commit message says it's there to avoid rematerializing constants, so maybe it could be omitted for the zero case but I am not too familiar with this code (@pcc).

; LLPARSER-NEXT: ret i64 ptrtoint (ptr addrspace(1) null to i64)
;
; INSTSIMPLIFY-LABEL: define {{[^@]+}}@constant_fold_ptrtoint_gep_zero() {
; INSTSIMPLIFY-NEXT: ret i64 ptrtoint (ptr addrspace(1) null to i64)
Copy link
Member

@arichardson arichardson Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, shouldn't we have the datalayout in instsimplify, does this mean we don't use the datalayout aware constant folding logic there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are several places where the handling needs to be updated.

Copy link
Member

@arichardson arichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks good to me, just a few comments about the test diffs.

…represent an actual `nullptr` instead of a zero-value pointer

The value of a `nullptr` is not always `0`. For example, on AMDGPU, the `nullptr` in address spaces 3 and 5 is `0xffffffff`. Currently, there is no target-independent way to get this information, making it difficult and error-prone to handle null pointers in target-agnostic code.

We do have `ConstantPointerNull`, but it might be a little confusing and misleading. It represents a pointer with an all-zero value rather than necessarily a real `nullptr`. Therefore, to represent a real `nullptr` in address space `N`, we need to use `addrspacecast ptr null to ptr addrspace(N)` and it can't be folded.

In this PR, we change the semantic of `ConstantPointerNull` to represent an actual `nullptr` instead of a zero-value pointer. Here is the detailed changes.

* `ptr addrspace(N) null` will represent the actual `nullptr` in address space `N`.
* `ptr addrspace(N) zeroinitializer` will represent a zero-value pointer in address space `N`.

* `Constant::getNullValue` will return a _null_ value. It is same as the current semantics except for the `PointerType`, which will return a real `nullptr` pointer.
* `Constant::getZeroValue` will return a zero value constant. It is completely same as the current semantics. To represent a zero-value pointer, a `ConstantExpr` will be used (effectively `inttoptr i8 0 to ptr addrspace(N)`).
* Correspondingly, there will be both `Constant::isNullValue` and `Constant::isZeroValue`.

The RFC is https://discourse.llvm.org/t/rfc-introduce-sentinel-pointer-value-to-datalayout/85265. It is a little bit old and the title might look different, but everything eventually converges to this change. An early attempt can be found in #131557, which has many valuable discussion as well.

This PR is still WIP but any early feedback is welcome. I'll include as many necessary code changes as possible in this PR, but eventually this needs to be carefully split into multiple PRs, and I'll do it after the changes look good to every one.
@shiltian shiltian force-pushed the users/shiltian/ptr-zeroinitializer-to-inttoptr branch from b9669ba to c52a3b6 Compare November 17, 2025 05:16
@shiltian
Copy link
Contributor Author

Still WIP. With more code changes, there are more test failures. Fixing them one by one. :-)

Copy link
Member

@arichardson arichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more small comments

if (std::optional<APInt> NullPtrValue = DL.getNullPtrValue(
DestTy->getScalarType()->getPointerAddressSpace())) {
if ((NullPtrValue->isZero() && C->isZeroValue()) ||
(NullPtrValue->isAllOnes() && C->isAllOnesValue()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have to check the width of the input value for all ones since inttoptr zero-extends, so a short value would not end up as canonical nullptr.


// If the input is a nullptr, we can fold it to the corresponding
// nullptr in the destination address space.
if (C->isNullValue())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually true? Are null values in one address space always null in all others? I imagine this is almost always true, but maybe you could have some where this conversion is not valid?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this property in langref so we should either add it or drop this fold.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually true? Are null values in one address space always null in all others? I imagine this is almost always true, but maybe you could have some where this conversion is not valid?

This is actually the whole point of this PR. The semantic would be, convert the nullptr in AS X to AS Y, so it would still be a nullptr in AS Y. I don't think the semantic should be convert the value of nullptr in AS X to the corresponding pointer value in AS Y.

I'll clarify this in the LangRef.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

// cannot be constant.
if (GV.hasCommonLinkage()) {
Check(GV.getInitializer()->isNullValue(),
Check(GV.getInitializer()->isZeroValue(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should add an optional data layout parameter to this function so it can return true for null pointers in the common case?

VectorizedValue = Builder.CreateShuffleVector(
VectorizedValue,
ConstantVector::getNullValue(VectorizedValue->getType()), Mask);
VectorizedValue, Constant::getNullValue(VectorizedValue->getType()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be getZeroValue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VectorizedValue, Constant::getNullValue(VectorizedValue->getType()),
VectorizedValue, Constant::getZeroValue(VectorizedValue->getType()),

Comment on lines 6 to +9
; CHECK-NOT: bitcast
; CHECK-NOT: trunc
; CHECK: addrspacecast
; CHECK: addrspacecast
; CHECK: ptr addrspace(1) null
; CHECK: ptr null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR but these checks are not great, can we regen this test with UTC and --check-globals and pre-commit that?

; CHECK-SAME: (ptr [[A:%.*]]) #[[ATTR0:[0-9]+]] {
; CHECK-NEXT: entry:
; CHECK-NEXT: [[DOTHWASAN_SHADOW:%.*]] = call ptr asm "", "=r,0"(ptr null)
; CHECK-NEXT: [[DOTHWASAN_SHADOW:%.*]] = call ptr asm "", "=r,0"(ptr zeroinitializer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm looking at HWAddressSanitizer::getShadowNonTls this does actually need to be an integer converted to a pointer, so it needs to be zero even if null pointers are all ones. The commit message says it's there to avoid rematerializing constants, so maybe it could be omitted for the zero case but I am not too familiar with this code (@pcc).

LLVMContext Ctx;
Type *OpVFTy = FixedVectorType::get(IntegerType::getInt1Ty(Ctx), OpVF);
Value *Op = ConstantVector::getNullValue(OpVFTy);
Value *Op = Constant::getNullValue(OpVFTy);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Value *Op = Constant::getNullValue(OpVFTy);
Value *Op = Constant::getZeroValue(OpVFTy);

VectorizedValue = Builder.CreateShuffleVector(
VectorizedValue,
ConstantVector::getNullValue(VectorizedValue->getType()), Mask);
VectorizedValue, Constant::getNullValue(VectorizedValue->getType()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VectorizedValue, Constant::getNullValue(VectorizedValue->getType()),
VectorizedValue, Constant::getZeroValue(VectorizedValue->getType()),

Comment on lines +130 to +134
// We can't fold inttoptr(0) to ConstantPointerNull without checking the
// target data layout. However, since data layout is not available here, we
// can't do this.
if (V->isZeroValue())
return nullptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// We can't fold inttoptr(0) to ConstantPointerNull without checking the
// target data layout. However, since data layout is not available here, we
// can't do this.
if (V->isZeroValue())
return nullptr;
// Note: We can't fold inttoptr(0) to ConstantPointerNull without checking
// the target data layout since null pointers could also be all-ones.

I don't think we need the early return here? Although I guess it's a cheap check and so might actually be better to have it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah is this because of the V->isNullValue() below? In that case let's keep it.

Comment on lines 173 to 175
if (V->isNullValue() && !DestTy->isX86_AMXTy() &&
opc != Instruction::AddrSpaceCast)
return Constant::getNullValue(DestTy);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (V->isNullValue() && !DestTy->isX86_AMXTy())
return Constant::getNullValue(DestTy);

Addrspacecast was already handled above. Not sure what is special about isX86_AMXTy, but I guess we need it.

Comment on lines +1505 to +1512
if (NullPtrValue->isZero()) {
return Constant::getZeroValue(DestTy);
} else if (NullPtrValue->isAllOnes()) {
return ConstantInt::get(
DestTy, NullPtrValue->zextOrTrunc(DestTy->getScalarSizeInBits()));
} else {
llvm_unreachable("invalid nullptr value");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (NullPtrValue->isZero()) {
return Constant::getZeroValue(DestTy);
} else if (NullPtrValue->isAllOnes()) {
return ConstantInt::get(
DestTy, NullPtrValue->zextOrTrunc(DestTy->getScalarSizeInBits()));
} else {
llvm_unreachable("invalid nullptr value");
}
assert(NullPtrValue->isZero() || NullPtrValue->isAllOnes());
return ConstantInt::get(
DestTy, NullPtrValue->zextOrTrunc(DestTy->getScalarSizeInBits()));

I believe this is equivalent and a bit simpler.

Comment on lines +4279 to +4286
if (std::optional<APInt> NullPtrVal =
DL.getNullPtrValue(NullPtr->getType()->getPointerAddressSpace())) {
AP.OutStreamer->emitIntValue(NullPtrVal->getSExtValue(), Size);
} else {
// We fall back to the default behavior of emitting a zero value if we
// can't get the null pointer value from the data layout.
AP.OutStreamer->emitIntValue(0, Size);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (std::optional<APInt> NullPtrVal =
DL.getNullPtrValue(NullPtr->getType()->getPointerAddressSpace())) {
AP.OutStreamer->emitIntValue(NullPtrVal->getSExtValue(), Size);
} else {
// We fall back to the default behavior of emitting a zero value if we
// can't get the null pointer value from the data layout.
AP.OutStreamer->emitIntValue(0, Size);
}
APInt NullPtrVal = DL.getNullPtrValue(NullPtr->getType()->getPointerAddressSpace());
AP.OutStreamer->emitIntValue(NullPtrVal.getSExtValue(), Size);

I wonder if we should drop the std::optional from the DataLayout and just always have a defined value? We could then also either drop the 'c' flag or do something like pc{0xaaaaa}3:64:64 to define a AS3 pointer with nullptr value of 0xaaaa?

Silently falling back to zero here seems like it would cause more pain for hypothetical targets with custom null pointers than forcing them to set the datalayout?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants