-
Notifications
You must be signed in to change notification settings - Fork 78
Description
TL;DR: Currently, Address
is unsafe to create from arbitrary values. But should it be? This paper says "Address creation from arbitrary integers is forbidden", but we should rethink about that.
The Address
type has many unsafe methods. Among them, there are methods for
- Creating
Address
instances:zero
,max
,from_usize
- Accessing memory:
load
,store
, ... - Converting into Rust reference types:
as_ref
,as_mut_ref
Understandably, memory accesses are unsafe because of potential races and unaligned/illegal memory operations, and conversions to Rust reference types should be unsafe because only the programmer can guarantee such conversions don't violate Rust's ownership and borrowing rules.
However, it's worth discussing the safety of creating Address
instances. The following methods create Address
instances, and some of them are marked as unsafe.
- safe:
ObjectReference::to_raw_address(self)
Address::from_ptr(ptr)
Address::from_mut_ptr(ptr)
Address::from_ref(r)
- Deriving from existing addresses, such as
add
,sub
,and
,or
, ...
- unsafe:
Address::zero()
Address::max()
Address::from_usize(raw)
According to the 2016 paper Rust as a Language for High Performance GC Implementation by @qinsoon et al.,
Addresses and object references are two distinct abstract concepts in GC implementations: an address represents an
arbitrary location in the memory space managed by the GC and address arithmetic is allowed (and necessary) on the address type, while an object reference...
and
We restrict the creation of
Address
to be either from raw pointers, which may be acquired frommmap
andmalloc
, or derived from an existing Address. Address creation from arbitrary integers is forbidden, with the single exception of the constantAddress::zero()
. This serves as an initial value for some fields of type Address within other structs, since Rust does not allow structs with uninitialized fields...
The paper did not explain why "Address creation from arbitrary integers is forbidden", and it apparently contradicts with "an address represents an arbitrary location in the memory space managed by the GC". In the current Rust MMTk, Address::zero()
, Address::max()
and Address::from_usize
are all marked as unsafe. Their doc comments say:
...It is unsafe and the user needs to be aware that they may create an invalid address...
It is unclear what "invalid address" is. Since creating Address
from pointers and ObjectReference
is considered safe, it seems to imply that Address
is supposed to point to somewhere "safe", such as inside an object (But by that time the raw address of ObjectReference
was not guaranteed to be inside an object, until #1195.) or inside a memory region obtained by mmap
or malloc
.
And since address arithmetics are considered safe, we can get an Address
safely from any ObjectReference
, and call addr.and(0).add(0x12345678)
to "safely" create an address from an arbitrary address 0x12345678. That bypasses the unsafe annotations on the creation methods.
In fact, in the current Rust MMTk code base, we use Address
to point to quite many things that are not inside objects. Things like Chunk
, Block
and Line
are wrappers of Address
. We can also derive sub-regions from their parents, such as iterating through all Block
s from a Chunk
, or all Lines
in a Block
, and they are both considered safe. We also have linear scanning algorithms that go through every byte, and that is considered safe, too.
So I think it is pointless to mark zero()
, max()
and from_usize()
as unsafe. Address
is just what it is: an address, an arbitrary address. There is no validity guarantee of an Address
anyway. It can be zero, be word-aligned or not, be inside the heap or not, be addressable by a 64-bit Intel CPU or not. There should be no restriction on creating Address
. Only memory accesses and methods that create Rust references should be unsafe. And it should be unsafe to convert Address
to ObjectReference
, too, because ObjectReference
does have the concept of validity (which can be checked by the valid-object (VO) bit).