-
Notifications
You must be signed in to change notification settings - Fork 78
Description
With issue #1170 addressed and PR #1195 merged, we now define ObjectReference
as a non-zero word-aligned address within an object, and we agree with this definition at least for the current implementation of mmtk-core. This issue summarizes discussions about the definition of ObjectReference
before that change so that we can come back and find our previous discussions if we discuss this topic again.
TL;DR: From time to time, we discuss the possibility to change the current definition of ObjectReference
. In theory, it should be opaque to mmtk-core. But I have discovered that not all definitions are good. This issue summarizes what a good definition should be, enumerate some popular definitions and discuss whether they are good.
Related links:
- Issue:
- "ObjectReference should be opaque": ObjectReference should be opaque #686
- Require ObjectReference to be inside an object: Require ObjectReference to be inside an object #1170
- Zulip:
- Our last conversation. https://mmtk.zulipchat.com/#narrow/stream/262679-General/topic/Terminology.3A.20canonical.20object.20reference.20address.3F
- General: https://mmtk.zulipchat.com/#narrow/stream/262673-mmtk-core/topic/object.20reference
- Why handles won't work: https://mmtk.zulipchat.com/#narrow/stream/262673-mmtk-core/topic/Don't.20create.20ObjectReferences.20during.20GC
- About internal references: https://mmtk.zulipchat.com/#narrow/stream/262673-mmtk-core/topic/Internal.20reference.20and.20handles
Not all definitions of ObjectReference are good
There is an argument that ObjectReference
may be opaque to mmtk-core. However, some definitions won't work at all, and others won't work efficiently. I am trying to list statements that needs to be true for all good definitions of ObjectReference. Any definition that satisfies all of those statements should work.
It must be able to instantiate ObjectReference from Address
Copying GCs will copy objects, and create a new ObjectReference
for the to-space copy.
Linear scanning will identify objects at addresses (using the global VO bit or a local bitmap), and generate ObjectReference
for those addresses.
Conservative stack scanning scans the stack for addresses with VO bit set, then we convert the Address to ObjectReference verbatim.
Handles don't satisfy this statement. Handles are implemented with indirection tables, and creating a handle implies adding a new entry in the indirection table. This simply costs too much, and probably needs synchronization. After forwarding, we will need to delete old handles from indirection tables because they point to from-space copies. Moreover, handles are often local to mutator threads. A more rational definition will be that we define the content in an indirection table entry (which is an address) as an ObjectReference
. In this way, if an object is moved, mmtk-core will use the new address as the new ObjectReference
, and "forwarding a reference in a slot" will become "updating the address in the indirection table entry for the handle in the slot".
It must be efficient to get the start address of the object from the ObjectReference.
Given an ObjectReference
, it must be able to get the start address of an object (i.e. whatever alloc
returns). Its current API is ObjectModel::ref_to_object_start()
.
It must be efficient to get a unique address inside the object from the ObjectReference.
Given an ObjectReference
, it must be able to get an address that is guaranteed to be inside the object, and this address needs to bt unique for the same object. Its current API is ObjectModel::ref_to_address()
.
That address is used for:
- Accessing on-the-side metadata
- Accessing SFT
- Testing whether an
ObjectReference
points to an object in a given space
In all case, the address is guaranteed to be in the same space where the object is allocated.
It must be efficient to do equality test for ObjectReference.
Currently we do equality test between ObjectReference
values in a few places:
- After
trace_object
, we test if the object has been forwarded by comparingnew_object == object
.- We may change
trace_object
so that it returnsOption<ObjectReference>
so that we know if an object is forwarded without equality tests.
- We may change
- In various assertions.
- In the "treadmill" where it uses a
HashSet
(Bug. See: Proper implementation of the treadmill algorithm #517) - In
ReferenceProcessor
where it usesHashSet
to de-duplicateObjectReference
instances.- Reference processing can be moved to the binding so that mmtk-core won't require
ObjectReference
to implementEq
.
- Reference processing can be moved to the binding so that mmtk-core won't require
- In sanity GC where we use a
HashSet
to record visited objects.- Can be replaced with an auxilliary bitmap, but that kind of defeat the purpose of sanity GC because we can use it to test if the metadata is implemented properly.
We may refactor them to make Eq
unnecessary, but it will be counterintuitive if we can't compare ObjectReference
for equality.
It must be hashable
As mentioned above, we sometimes put ObjectReference
inside hash sets.
When copied, the from-space copy and the to-space copy are considered different objects.
This means when copying an object, the original ObjectReference
refers to the from-space copy of the object, and the to-space copy of the object will have a different ObjectReference
, and they must not compare equal. The process of "forwarding a reference in a slot" means replace the old ObjectReference
in the slot with the new ObjectReference
so that it now points to the to-space copy.
At the language level, "a reference to an object" does not change even if the GC moves the object. In other words, the high-level language is oblivious of object movement as a result of GC (unless object pinning is performed which allows the user to reveal the address of an object safely). The high-level language is also oblivious of duplicated copies of objects in concurrent copying GCs, such as Shenandoah, ZGC and Sapphire. That's why the VM (or the GC?) must implement a kind of equality operator that compares them as equal at the language level during concurrent copying, when the object has two copies simultaneously. This means language-level identities (such as unique IDs of language-level objects) are not good definitions of ObjectReference
.
Other statements that should be true
ObjectReference doesn't have to be the content of slots.
An object field (slot) can hold a handle, a fat pointer, an interior pointer, a tagged pointer, etc.
- If a slot holds a handle, we can define
ObjectReference
as the address in the indirection table entry. - If a slot holds a fat pointer (a tuple of
(pointer, offset)
), we can defineObjectReference
as the pointer part of the fat pointer. - If a slot holds an interior pointer, we can define
ObjectReference
as the highest address that (1) is not higher than the interior pointer, and (2) VO bit is set at that address. - If a slot holds a tagged pointer, we can define
ObjectReference
as the address without the tag bits.
In all cases, we can update the slot if an object is forwarded.
Examples of valid definitions
Starting address
Obviously. OpenJDK uses starting addresses of objects as ObjectReference
.
Address at an offset from the object start.
JikesRVM does this.
Potential definitions
Tagged union of pointer and non-pointer value
Ruby does this. If a Ruby VALUE
points to an object, its last three bits are all 0. The pointer will not have any tag bits. Other values (true
, false
, nil
, small integers, etc.) are not references to objects. So we can simply define ObjectReference
as "starting address" (or "address at an offset" if we add additional data in the front).
Tagged pointer without type info
V8 does this. The last bit is 1
if a slot holds a reference. The second last bit is 0 if it is a strong reference, and 1 if it is a weak reference. We may define ObjectReference
as the address without the tag bits. MMTk won't be aware of those bits, and the binding is still able to update fields for forwarding.
We may define "the address with tag bits" (i.e. the slot content) as ObjectReference
. MMTk will be able to generate address, but always with 0b01
as the last two bits. It is trivial to get the starting address and an in-object address by removing the tags. However, the VM binding will need to implement the Eq
and the Hash
trait manually and ignore the tag bits. This may not be the most efficient way to do it.
Tagged pointer or fat pointer with embedded type info
Some VMs may embed type information inside the pointer, or fat pointer. I heard JRocket did this, but never saw its implementation. This probably will not work because MMTk will have a hard time getting the type info when constructing an ObjectReference
from an Address
. It's not completely impossible, but it will need to load the type information from the object body, which may be inefficient. As I mentioned above, for such VMs, we can define the address part of the tagged pointer or fat pointer as ObjectReference
.
Interior pointer
Probably not a good idea because every time it needs to get the object start or the unique "in-object address", it needs to scan the VO bit bitmap backwards. We may introduce an InteriorPointer
type in mmtk-core, but as I mentioned above, it is not necessary.