Skip to content

Rethink about whether creating Address from usize should be unsafe #1227

@wks

Description

@wks

TL;DR: Currently, Address is unsafe to create from arbitrary values. But should it be? This paper says "Address creation from arbitrary integers is forbidden", but we should rethink about that.

The Address type has many unsafe methods. Among them, there are methods for

  • Creating Address instances: zero, max, from_usize
  • Accessing memory: load, store, ...
  • Converting into Rust reference types: as_ref, as_mut_ref

Understandably, memory accesses are unsafe because of potential races and unaligned/illegal memory operations, and conversions to Rust reference types should be unsafe because only the programmer can guarantee such conversions don't violate Rust's ownership and borrowing rules.

However, it's worth discussing the safety of creating Address instances. The following methods create Address instances, and some of them are marked as unsafe.

  • safe:
    • ObjectReference::to_raw_address(self)
    • Address::from_ptr(ptr)
    • Address::from_mut_ptr(ptr)
    • Address::from_ref(r)
    • Deriving from existing addresses, such as add, sub, and, or, ...
  • unsafe:
    • Address::zero()
    • Address::max()
    • Address::from_usize(raw)

According to the 2016 paper Rust as a Language for High Performance GC Implementation by @qinsoon et al.,

Addresses and object references are two distinct abstract concepts in GC implementations: an address represents an
arbitrary location in the memory space managed by the GC and address arithmetic is allowed (and necessary) on the address type, while an object reference...

and

We restrict the creation of Address to be either from raw pointers, which may be acquired from mmap and malloc, or derived from an existing Address. Address creation from arbitrary integers is forbidden, with the single exception of the constant Address::zero(). This serves as an initial value for some fields of type Address within other structs, since Rust does not allow structs with uninitialized fields...

The paper did not explain why "Address creation from arbitrary integers is forbidden", and it apparently contradicts with "an address represents an arbitrary location in the memory space managed by the GC". In the current Rust MMTk, Address::zero(), Address::max() and Address::from_usize are all marked as unsafe. Their doc comments say:

...It is unsafe and the user needs to be aware that they may create an invalid address...

It is unclear what "invalid address" is. Since creating Address from pointers and ObjectReference is considered safe, it seems to imply that Address is supposed to point to somewhere "safe", such as inside an object (But by that time the raw address of ObjectReference was not guaranteed to be inside an object, until #1195.) or inside a memory region obtained by mmap or malloc.

And since address arithmetics are considered safe, we can get an Address safely from any ObjectReference, and call addr.and(0).add(0x12345678) to "safely" create an address from an arbitrary address 0x12345678. That bypasses the unsafe annotations on the creation methods.

In fact, in the current Rust MMTk code base, we use Address to point to quite many things that are not inside objects. Things like Chunk, Block and Line are wrappers of Address. We can also derive sub-regions from their parents, such as iterating through all Blocks from a Chunk, or all Lines in a Block, and they are both considered safe. We also have linear scanning algorithms that go through every byte, and that is considered safe, too.

So I think it is pointless to mark zero(), max() and from_usize() as unsafe. Address is just what it is: an address, an arbitrary address. There is no validity guarantee of an Address anyway. It can be zero, be word-aligned or not, be inside the heap or not, be addressable by a 64-bit Intel CPU or not. There should be no restriction on creating Address. Only memory accesses and methods that create Rust references should be unsafe. And it should be unsafe to convert Address to ObjectReference, too, because ObjectReference does have the concept of validity (which can be checked by the valid-object (VO) bit).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions