Skip to content

Axioms that good definitions of ObjectReference must satisfy #1044

@wks

Description

@wks

With issue #1170 addressed and PR #1195 merged, we now define ObjectReference as a non-zero word-aligned address within an object, and we agree with this definition at least for the current implementation of mmtk-core. This issue summarizes discussions about the definition of ObjectReference before that change so that we can come back and find our previous discussions if we discuss this topic again.

TL;DR: From time to time, we discuss the possibility to change the current definition of ObjectReference. In theory, it should be opaque to mmtk-core. But I have discovered that not all definitions are good. This issue summarizes what a good definition should be, enumerate some popular definitions and discuss whether they are good.

Related links:

Not all definitions of ObjectReference are good

There is an argument that ObjectReference may be opaque to mmtk-core. However, some definitions won't work at all, and others won't work efficiently. I am trying to list statements that needs to be true for all good definitions of ObjectReference. Any definition that satisfies all of those statements should work.

It must be able to instantiate ObjectReference from Address

Copying GCs will copy objects, and create a new ObjectReference for the to-space copy.

Linear scanning will identify objects at addresses (using the global VO bit or a local bitmap), and generate ObjectReference for those addresses.

Conservative stack scanning scans the stack for addresses with VO bit set, then we convert the Address to ObjectReference verbatim.

Handles don't satisfy this statement. Handles are implemented with indirection tables, and creating a handle implies adding a new entry in the indirection table. This simply costs too much, and probably needs synchronization. After forwarding, we will need to delete old handles from indirection tables because they point to from-space copies. Moreover, handles are often local to mutator threads. A more rational definition will be that we define the content in an indirection table entry (which is an address) as an ObjectReference. In this way, if an object is moved, mmtk-core will use the new address as the new ObjectReference, and "forwarding a reference in a slot" will become "updating the address in the indirection table entry for the handle in the slot".

It must be efficient to get the start address of the object from the ObjectReference.

Given an ObjectReference, it must be able to get the start address of an object (i.e. whatever alloc returns). Its current API is ObjectModel::ref_to_object_start().

It must be efficient to get a unique address inside the object from the ObjectReference.

Given an ObjectReference, it must be able to get an address that is guaranteed to be inside the object, and this address needs to bt unique for the same object. Its current API is ObjectModel::ref_to_address().

That address is used for:

  • Accessing on-the-side metadata
  • Accessing SFT
  • Testing whether an ObjectReference points to an object in a given space

In all case, the address is guaranteed to be in the same space where the object is allocated.

It must be efficient to do equality test for ObjectReference.

Currently we do equality test between ObjectReference values in a few places:

  • After trace_object, we test if the object has been forwarded by comparing new_object == object.
    • We may change trace_object so that it returns Option<ObjectReference> so that we know if an object is forwarded without equality tests.
  • In various assertions.
  • In the "treadmill" where it uses a HashSet (Bug. See: Proper implementation of the treadmill algorithm #517)
  • In ReferenceProcessor where it uses HashSet to de-duplicate ObjectReference instances.
    • Reference processing can be moved to the binding so that mmtk-core won't require ObjectReference to implement Eq.
  • In sanity GC where we use a HashSet to record visited objects.
    • Can be replaced with an auxilliary bitmap, but that kind of defeat the purpose of sanity GC because we can use it to test if the metadata is implemented properly.

We may refactor them to make Eq unnecessary, but it will be counterintuitive if we can't compare ObjectReference for equality.

It must be hashable

As mentioned above, we sometimes put ObjectReference inside hash sets.

When copied, the from-space copy and the to-space copy are considered different objects.

This means when copying an object, the original ObjectReference refers to the from-space copy of the object, and the to-space copy of the object will have a different ObjectReference, and they must not compare equal. The process of "forwarding a reference in a slot" means replace the old ObjectReference in the slot with the new ObjectReference so that it now points to the to-space copy.

At the language level, "a reference to an object" does not change even if the GC moves the object. In other words, the high-level language is oblivious of object movement as a result of GC (unless object pinning is performed which allows the user to reveal the address of an object safely). The high-level language is also oblivious of duplicated copies of objects in concurrent copying GCs, such as Shenandoah, ZGC and Sapphire. That's why the VM (or the GC?) must implement a kind of equality operator that compares them as equal at the language level during concurrent copying, when the object has two copies simultaneously. This means language-level identities (such as unique IDs of language-level objects) are not good definitions of ObjectReference.

Other statements that should be true

ObjectReference doesn't have to be the content of slots.

An object field (slot) can hold a handle, a fat pointer, an interior pointer, a tagged pointer, etc.

  • If a slot holds a handle, we can define ObjectReference as the address in the indirection table entry.
  • If a slot holds a fat pointer (a tuple of (pointer, offset)), we can define ObjectReference as the pointer part of the fat pointer.
  • If a slot holds an interior pointer, we can define ObjectReference as the highest address that (1) is not higher than the interior pointer, and (2) VO bit is set at that address.
  • If a slot holds a tagged pointer, we can define ObjectReference as the address without the tag bits.

In all cases, we can update the slot if an object is forwarded.

Examples of valid definitions

Starting address

Obviously. OpenJDK uses starting addresses of objects as ObjectReference.

Address at an offset from the object start.

JikesRVM does this.

Potential definitions

Tagged union of pointer and non-pointer value

Ruby does this. If a Ruby VALUE points to an object, its last three bits are all 0. The pointer will not have any tag bits. Other values (true, false, nil, small integers, etc.) are not references to objects. So we can simply define ObjectReference as "starting address" (or "address at an offset" if we add additional data in the front).

Tagged pointer without type info

V8 does this. The last bit is 1 if a slot holds a reference. The second last bit is 0 if it is a strong reference, and 1 if it is a weak reference. We may define ObjectReference as the address without the tag bits. MMTk won't be aware of those bits, and the binding is still able to update fields for forwarding.

We may define "the address with tag bits" (i.e. the slot content) as ObjectReference. MMTk will be able to generate address, but always with 0b01 as the last two bits. It is trivial to get the starting address and an in-object address by removing the tags. However, the VM binding will need to implement the Eq and the Hash trait manually and ignore the tag bits. This may not be the most efficient way to do it.

Tagged pointer or fat pointer with embedded type info

Some VMs may embed type information inside the pointer, or fat pointer. I heard JRocket did this, but never saw its implementation. This probably will not work because MMTk will have a hard time getting the type info when constructing an ObjectReference from an Address. It's not completely impossible, but it will need to load the type information from the object body, which may be inefficient. As I mentioned above, for such VMs, we can define the address part of the tagged pointer or fat pointer as ObjectReference.

Interior pointer

Probably not a good idea because every time it needs to get the object start or the unique "in-object address", it needs to scan the VO bit bitmap backwards. We may introduce an InteriorPointer type in mmtk-core, but as I mentioned above, it is not necessary.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P-lowPriority: Low. A low-priority issue won't be scheduled and assigned. Any help is welcome.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions