Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
90068f1
[𝘀𝗽𝗿] initial version
arichardson Aug 22, 2024
13983a2
[𝘀𝗽𝗿] changes to main this commit is based on
arichardson Aug 22, 2024
e4bd118
fix indentation in langref
arichardson Aug 22, 2024
35afb97
rebase
arichardson Oct 25, 2024
c55290e
[𝘀𝗽𝗿] changes introduced through rebase
RKSimon Oct 25, 2024
db97145
include feedback
arichardson Oct 25, 2024
94ecfa3
address more feedback
arichardson Oct 25, 2024
d933fc9
[𝘀𝗽𝗿] changes introduced through rebase
jhuber6 Oct 29, 2024
278ce21
address arsenm feedback and add a test
arichardson Oct 29, 2024
9c2aecb
[𝘀𝗽𝗿] changes introduced through rebase
mshockwave Jan 3, 2025
df9bdfe
split out the external state property based on discourse discussion
arichardson Jan 3, 2025
9834171
[𝘀𝗽𝗿] changes introduced through rebase
nico Jan 6, 2025
7615db9
rebase
arichardson Jan 6, 2025
1e07d91
[𝘀𝗽𝗿] changes introduced through rebase
arichardson Jan 14, 2025
142a3ff
fix bug in parsing and extend tests -- will update LangRef shortly
arichardson Jan 14, 2025
ddc29aa
[𝘀𝗽𝗿] changes introduced through rebase
kparzysz Jul 21, 2025
bdb6acc
rebased and updated following conclustion of ptrtoint semantics
arichardson Jul 21, 2025
de449dd
clang-format
arichardson Jul 21, 2025
2c49735
typo fixes
arichardson Jul 27, 2025
eae5a3e
[𝘀𝗽𝗿] changes introduced through rebase
s-barannikov Sep 20, 2025
2da5d51
update non-intgegral property based on feedback, drop 'n' flag
arichardson Sep 20, 2025
a08d1f9
fix typo in langref
arichardson Sep 20, 2025
e740d60
fix tests after semantic change
arichardson Sep 20, 2025
4fee21f
rebase, add Type* overloads
arichardson Sep 20, 2025
9227e72
[𝘀𝗽𝗿] changes introduced through rebase
boomanaiden154 Sep 21, 2025
06f5ddf
address feedback, add new non-address bits section
arichardson Sep 21, 2025
82c5832
remove no longer valid test check
arichardson Sep 21, 2025
d0cab97
[𝘀𝗽𝗿] changes introduced through rebase
ellishg Sep 22, 2025
6004d6c
feedback, rename shouldAvoid to mustNotIntroduce
arichardson Sep 22, 2025
faf0565
typo fix
arichardson Sep 22, 2025
2ade3c6
[𝘀𝗽𝗿] changes introduced through rebase
michaelrj-google Sep 23, 2025
9abba46
rebase
arichardson Sep 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
139 changes: 118 additions & 21 deletions llvm/docs/LangRef.rst
Original file line number Diff line number Diff line change
Expand Up @@ -660,19 +660,60 @@ Non-Integral Pointer Type
Note: non-integral pointer types are a work in progress, and they should be
considered experimental at this time.

LLVM IR optionally allows the frontend to denote pointers in certain address
spaces as "non-integral" via the :ref:`datalayout string<langref_datalayout>`.
Non-integral pointer types represent pointers that have an *unspecified* bitwise
representation; that is, the integral representation may be target dependent or
unstable (not backed by a fixed integer).
For most targets, the pointer representation is a direct mapping from the
bitwise representation to the address of the underlying memory location.
Such pointers are considered "integral", and any pointers where the
representation is not just an integer address are called "non-integral".

Non-integral pointers have at least one of the following three properties:

* the pointer representation contains non-address bits
* the pointer representation is unstable (may changed at any time in a
target-specific way)
* the pointer representation has external state

These properties (or combinations thereof) can be applied to pointers via the
:ref:`datalayout string<langref_datalayout>`.

The exact implications of these properties are target-specific. The following
subsections describe the IR semantics and restrictions to optimization passes
for each of these properties.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this applies to only an SSA value of an unstable pointer type? What about an in-memory value with the unstable pointer type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with how GC pointers are used in LLVM, I just tried to split out the existing "copying GC" non-integral pointers properties into a separate property to allow for "fat pointers", CHERI capabilities, etc to use non-integral pointers without incurring all the restrictions imposed by GC pointers.

Not sure who is best to comment on this, probably someone from azul who has worked on it recently.

Pointers with non-address bits
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Pointers in this address space have a bitwise representation that not only
has address bits, but also some other target-specific metadata.
In most cases pointers with non-address bits behave exactly the same as
integral pointers, the only difference is that it is not possible to create a
pointer just from an address unless all the non-address bits are also recreated
correctly in a target-specific way.

An example of pointers with non-address bits are the AMDGPU buffer descriptors
which are 160 bits: a 128-bit fat pointer and a 32-bit offset.
Similarly, CHERI capabilities contain a 32 or 64 bit address as well as the
same number of metadata bits, but unlike the AMDGPU buffer descriptors they have
external state in addition to non-address bits.


Unstable pointer representation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Pointers in this address space have an *unspecified* bitwise representation
(i.e. not backed by a fixed integer). The bitwise pattern of such pointers is
allowed to change in a target-specific way. For example, this could be a pointer
type used with copying garbage collection where the garbage collector could
update the pointer at any time in the collection sweep.

``inttoptr`` and ``ptrtoint`` instructions have the same semantics as for
integral (i.e., normal) pointers in that they convert integers to and from
corresponding pointer types, but there are additional implications to be
aware of. Because the bit-representation of a non-integral pointer may
not be stable, two identical casts of the same operand may or may not
corresponding pointer types, but there are additional implications to be aware
of.

For "unstable" pointer representations, the bit-representation of the pointer
may not be stable, so two identical casts of the same operand may or may not
return the same value. Said differently, the conversion to or from the
non-integral type depends on environmental state in an implementation
"unstable" pointer type depends on environmental state in an implementation
defined manner.

If the frontend wishes to observe a *particular* value following a cast, the
Expand All @@ -681,21 +722,72 @@ defined manner. (In practice, this tends to require ``noinline`` routines for
such operations.)

From the perspective of the optimizer, ``inttoptr`` and ``ptrtoint`` for
non-integral types are analogous to ones on integral types with one
"unstable" pointer types are analogous to ones on integral types with one
key exception: the optimizer may not, in general, insert new dynamic
occurrences of such casts. If a new cast is inserted, the optimizer would
need to either ensure that a) all possible values are valid, or b)
appropriate fencing is inserted. Since the appropriate fencing is
implementation defined, the optimizer can't do the latter. The former is
challenging as many commonly expected properties, such as
``ptrtoint(v)-ptrtoint(v) == 0``, don't hold for non-integral types.
``ptrtoint(v)-ptrtoint(v) == 0``, don't hold for "unstable" pointer types.
Similar restrictions apply to intrinsics that might examine the pointer bits,
such as :ref:`llvm.ptrmask<int_ptrmask>`.

The alignment information provided by the frontend for a non-integral pointer
The alignment information provided by the frontend for an "unstable" pointer
(typically using attributes or metadata) must be valid for every possible
representation of the pointer.

Pointers with external state
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A further special case of non-integral pointers is ones that include external
state (such as bounds information or a type tag) with a target-defined size.
An example of such a type is a CHERI capability, where there is an additional
validity bit that is part of all pointer-typed registers, but is located in
memory at an implementation-defined address separate from the pointer itself.
Another example would be a fat-pointer scheme where pointers remain plain
integers, but the associated bounds are stored in an out-of-band table.

Unless also marked as "unstable", the bit-wise representation of pointers with
external state is stable and ``ptrtoint(x)`` always yields a deterministic
value. This means transformation passes are still permitted to insert new
``ptrtoint`` instructions.

The following restrictions apply to IR level optimization passes:

The ``inttoptr`` instruction does not recreate the external state and therefore
it is target dependent whether it can be used to create a dereferenceable
pointer. In general passes should assume that the result of such an inttoptr
is not dereferenceable. For example, on CHERI targets an ``inttoptr`` will
yield a capability with the external state (the validity tag bit) set to zero,
which will cause any dereference to trap.
The ``ptrtoint`` instruction also only returns the "in-band" state and omits
all external state.

When a ``store ptr addrspace(N) %p, ptr @dst`` of such a non-integral pointer
is performed, the external metadata is also stored to an implementation-defined
location. Similarly, a ``%val = load ptr addrspace(N), ptr @dst`` will fetch the
external metadata and make it available for all uses of ``%val``.
Similarly, the ``llvm.memcpy`` and ``llvm.memmove`` intrinsics also transfer the
external state. This is essential to allow frontends to efficiently emit copies
of structures containing such pointers, since expanding all these copies as
individual loads and stores would affect compilation speed and inhibit
optimizations.

Notionally, these external bits are part of the pointer, but since
``inttoptr`` / ``ptrtoint``` only operate on the "in-band" bits of the pointer
and the external bits are not explicitly exposed, they are not included in the
size specified in the :ref:`datalayout string<langref_datalayout>`.

When a pointer type has external state, all roundtrips via memory must
be performed as loads and stores of the correct type since stores of other
types may not propagate the external data.
Therefore it is not legal to convert an existing load/store (or a
``llvm.memcpy`` / ``llvm.memmove`` intrinsic) of pointer types with external
state to a load/store of an integer type with same bitwidth, as that may drop
the external state.


.. _globalvars:

Global Variables
Expand Down Expand Up @@ -3179,8 +3271,8 @@ as follows:
``A<address space>``
Specifies the address space of objects created by '``alloca``'.
Defaults to the default address space of 0.
``p[n]:<size>:<abi>[:<pref>[:<idx>]]``
This specifies the properties of a pointer in address space ``n``.
``p[<flags>][<as>]:<size>:<abi>[:<pref>[:<idx>]]``
This specifies the properties of a pointer in address space ``as``.
The ``<size>`` parameter specifies the size of the bitwise representation.
For :ref:`non-integral pointers <nointptrtype>` the representation size may
be larger than the address width of the underlying address space (e.g. to
Expand All @@ -3193,9 +3285,13 @@ as follows:
default index size is equal to the pointer size.
The index size also specifies the width of addresses in this address space.
All sizes are in bits.
The address space, ``n``, is optional, and if not specified,
denotes the default address space 0. The value of ``n`` must be
in the range [1,2^24).
The address space, ``<as>``, is optional, and if not specified, denotes the
default address space 0. The value of ``<as>`` must be in the range [1,2^24).
The optional ``<flags>`` are used to specify properties of pointers in this
address space: the character ``u`` marks pointers as having an unstable
representation, and ``e`` marks pointers having external state. See
:ref:`Non-Integral Pointer Types <nointptrtype>`.

``i<size>:<abi>[:<pref>]``
This specifies the alignment for an integer type of a given bit
``<size>``. The value of ``<size>`` must be in the range [1,2^24).
Expand Down Expand Up @@ -3248,9 +3344,11 @@ as follows:
this set are considered to support most general arithmetic operations
efficiently.
``ni:<address space0>:<address space1>:<address space2>...``
This specifies pointer types with the specified address spaces
as :ref:`Non-Integral Pointer Type <nointptrtype>` s. The ``0``
address space cannot be specified as non-integral.
This marks pointer types with the specified address spaces
as :ref:`unstable <nointptrtype>`.
The ``0`` address space cannot be specified as non-integral.
It is only supported for backwards compatibility, the flags of the ``p``
specifier should be used instead for new code.

``<abi>`` is a lower bound on what is required for a type to be considered
aligned. This is used in various places, such as:
Expand Down Expand Up @@ -31402,4 +31500,3 @@ Semantics:

The '``llvm.preserve.struct.access.index``' intrinsic produces the same result
as a getelementptr with base ``base`` and access operands ``{0, gep_index}``.

113 changes: 102 additions & 11 deletions llvm/include/llvm/IR/DataLayout.h
Original file line number Diff line number Diff line change
Expand Up @@ -77,12 +77,21 @@ class DataLayout {
uint32_t BitWidth;
Align ABIAlign;
Align PrefAlign;
/// The index bit width also defines the address size in this address space.
/// If the index width is less than the representation bit width, the
/// pointer is non-integral and bits beyond the index width could be used
/// for additional metadata (e.g. AMDGPU buffer fat pointers with bounds
/// and other flags or CHERI capabilities that contain bounds+permissions).
uint32_t IndexBitWidth;
/// Pointers in this address space don't have a well-defined bitwise
/// representation (e.g. may be relocated by a copying garbage collector).
/// Additionally, they may also be non-integral (i.e. containing additional
/// metadata such as bounds information/permissions).
bool IsNonIntegral;
/// representation (e.g. they may be relocated by a copying garbage
/// collector and thus have different addresses at different times).
bool HasUnstableRepresentation;
/// Pointers in this address space have additional state bits that are
/// located at a target-defined location when stored in memory. An example
/// of this would be CHERI capabilities where the validity bit is stored
/// separately from the pointer address+bounds information.
bool HasExternalState;
LLVM_ABI bool operator==(const PointerSpec &Other) const;
};

Expand Down Expand Up @@ -149,7 +158,7 @@ class DataLayout {
/// Sets or updates the specification for pointer in the given address space.
void setPointerSpec(uint32_t AddrSpace, uint32_t BitWidth, Align ABIAlign,
Align PrefAlign, uint32_t IndexBitWidth,
bool IsNonIntegral);
bool HasUnstableRepr, bool HasExternalState);

/// Internal helper to get alignment for integer of given bitwidth.
LLVM_ABI Align getIntegerAlignment(uint32_t BitWidth, bool abi_or_pref) const;
Expand Down Expand Up @@ -355,30 +364,112 @@ class DataLayout {
/// \sa DataLayout::getAddressSizeInBits
unsigned getAddressSize(unsigned AS) const { return getIndexSize(AS); }

/// Return the address spaces containing non-integral pointers. Pointers in
/// this address space don't have a well-defined bitwise representation.
SmallVector<unsigned, 8> getNonIntegralAddressSpaces() const {
/// Return the address spaces with special pointer semantics (such as being
/// unstable or non-integral).
SmallVector<unsigned, 8> getNonStandardAddressSpaces() const {
SmallVector<unsigned, 8> AddrSpaces;
for (const PointerSpec &PS : PointerSpecs) {
if (PS.IsNonIntegral)
if (PS.HasUnstableRepresentation || PS.HasExternalState ||
PS.BitWidth != PS.IndexBitWidth)
AddrSpaces.push_back(PS.AddrSpace);
}
return AddrSpaces;
}

/// Returns whether this address space has a non-integral pointer
/// representation, i.e. the pointer is not just an integer address but some
/// other bitwise representation. When true, passes cannot assume that all
/// bits of the representation map directly to the allocation address.
/// NOTE: This also returns true for "unstable" pointers where the
/// representation may be just an address, but this value can change at any
/// given time (e.g. due to copying garbage collection).
/// Examples include AMDGPU buffer descriptors with a 128-bit fat pointer
/// and a 32-bit offset or CHERI capabilities that contain bounds, permissions
/// and an out-of-band validity bit.
///
/// In general, more specialized functions such as mustNotIntroduceIntToPtr(),
/// mustNotIntroducePtrToInt(), or hasExternalState() should be
/// preferred over this one when reasoning about the behavior of IR
/// analysis/transforms.
/// TODO: should remove/deprecate this once all uses have migrated.
bool isNonIntegralAddressSpace(unsigned AddrSpace) const {
return getPointerSpec(AddrSpace).IsNonIntegral;
const auto &PS = getPointerSpec(AddrSpace);
return PS.BitWidth != PS.IndexBitWidth || PS.HasUnstableRepresentation ||
PS.HasExternalState;
}

/// Returns whether this address space has an "unstable" pointer
/// representation. The bitwise pattern of such pointers is allowed to change
/// in a target-specific way. For example, this could be used for copying
/// garbage collection where the garbage collector could update the pointer
/// value as part of the collection sweep.
bool hasUnstableRepresentation(unsigned AddrSpace) const {
return getPointerSpec(AddrSpace).HasUnstableRepresentation;
}
bool hasUnstableRepresentation(Type *Ty) const {
auto *PTy = dyn_cast<PointerType>(Ty->getScalarType());
return PTy && hasUnstableRepresentation(PTy->getPointerAddressSpace());
}

/// Returns whether this address space has external state (implies having
/// a non-integral pointer representation).
/// These pointer types must be loaded and stored using appropriate
/// instructions and cannot use integer loads/stores as this would not
/// propagate the out-of-band state. An example of such a pointer type is a
/// CHERI capability that contain bounds, permissions and an out-of-band
/// validity bit that is invalidated whenever an integer/FP store is performed
/// to the associated memory location.
bool hasExternalState(unsigned AddrSpace) const {
return getPointerSpec(AddrSpace).HasExternalState;
}
bool hasExternalState(Type *Ty) const {
auto *PTy = dyn_cast<PointerType>(Ty->getScalarType());
return PTy && hasExternalState(PTy->getPointerAddressSpace());
}

/// Returns whether passes must avoid introducing `inttoptr` instructions
/// for this address space (unless they have target-specific knowledge).
///
/// This is currently the case for non-integral pointer representations with
/// external state (hasExternalState()) since `inttoptr` cannot recreate the
/// external state bits.
/// New `inttoptr` instructions should also be avoided for "unstable" bitwise
/// representations (hasUnstableRepresentation()) unless the pass knows it is
/// within a critical section that retains the current representation.
bool mustNotIntroduceIntToPtr(unsigned AddrSpace) const {
return hasUnstableRepresentation(AddrSpace) || hasExternalState(AddrSpace);
}

/// Returns whether passes must avoid introducing `ptrtoint` instructions
/// for this address space (unless they have target-specific knowledge).
///
/// This is currently the case for pointer address spaces that have an
/// "unstable" representation (hasUnstableRepresentation()) since the
/// bitwise pattern of such pointers could change unless the pass knows it is
/// within a critical section that retains the current representation.
bool mustNotIntroducePtrToInt(unsigned AddrSpace) const {
return hasUnstableRepresentation(AddrSpace);
}

bool isNonIntegralPointerType(PointerType *PT) const {
return isNonIntegralAddressSpace(PT->getAddressSpace());
}

bool isNonIntegralPointerType(Type *Ty) const {
auto *PTy = dyn_cast<PointerType>(Ty);
auto *PTy = dyn_cast<PointerType>(Ty->getScalarType());
return PTy && isNonIntegralPointerType(PTy);
}

bool mustNotIntroducePtrToInt(Type *Ty) const {
auto *PTy = dyn_cast<PointerType>(Ty->getScalarType());
return PTy && mustNotIntroducePtrToInt(PTy->getPointerAddressSpace());
}

bool mustNotIntroduceIntToPtr(Type *Ty) const {
auto *PTy = dyn_cast<PointerType>(Ty->getScalarType());
return PTy && mustNotIntroduceIntToPtr(PTy->getPointerAddressSpace());
}

/// The size in bits of the pointer representation in a given address space.
/// This is not necessarily the same as the integer address of a pointer (e.g.
/// for fat pointers).
Expand Down
Loading
Loading