Skip to content

Conversation

oli-obk
Copy link
Contributor

@oli-obk oli-obk commented Aug 1, 2020

@clarfonthey
Copy link

I am definitely going to need some time to digest this before providing actual feedback, but I guess that one main source of confusion I have is how exactly structs with unsized fields will work. i.e. if I make:

struct Thing {
    data: u32,
    more_data, [u32],
}

How exactly will that work? The wide pointer for Thing will just inherit from Slice<u32> but it's not clear how you'd actually construct it.

Like, this is actually another disconnect I see -- you can construct the wide pointer and make it work, but there's no clear way to convert from an owned value to a wide pointer. Or maybe there is and I didn't read carefully enough.

@oli-obk
Copy link
Contributor Author

oli-obk commented Aug 4, 2020

TLDR: this is not actually relevant to the RFC, because the unsizing of aggregates with trailing unsized types delegates to the unsizing of the built-in unsized types.

I'll address slices further down, let me first start with trait objects.

If you have a

struct Foo<T: ?Sized + MyTrait> {
    a: i32,
    b: T,
}

and you're using it to convert from a pointer to the sized version to an unsized version (Side note: I'm assuming you are talking about sized vs unsized, even though you mentioned "owned". Box<dyn Trait> is also owned, just owning an unsized thing).

let x = Foo { a: 42, b: SomeStruct };
let y: &Foo<SomeStruct> = &x;
let z: &Foo<dyn MyTrait> = y;

Then two steps happen, the first one is trivial, you get a pointer to x and store it in y. This is a thin pointer, simply to the address of x. Then you do the unsizing, which invokes CustomUnsize::from::<Foo<SomeStruct, false>(y), giving you essentially

Pointer {
    ptr: &x,
    vtable,
}

where vtable is the same vtable you'd get for &SomeStruct as &dyn MyTrait. Since you can't invoke MyTrait methods on Foo<dyn MyTrait>, there are no pointer indirection problems or anything. This is also how it works without this RFC. If you want to invoke methods on the b field, you have to do z.b.foo(), which will give you a &dyn MyTrait reference via &z.b and then everything works out just like with direct references (well, because now you have a direct reference).

For slices it's a bit weird, as only arrays unsize to slices in a native manner. Everything else (like Vec) actually uses a Deref impl to manually create a slice and return that, there's no unsizing going on. So, you're right, my scheme has no way to specify how to unsize from arrays to slices, but if it had (pondering on this now), we'd get aggregates with trailing slices working automatically, just like with dyn Trait.

@clarfonthey
Copy link

Oh, that makes sense.

I guess that one main weirdness for me is that the unsizing can only happen for references and not for values. The story around unsized locals is incomplete but I'm not 100% sure how this existing definition would allow e.g. let w: Foo<dyn MyTrait> = Foo { a: 42, b: MyStruct }.

And not to mention that technically, your definition doesn't explicitly require these types to have a dynamic size. They certainly can and the trait makes it easy for this to happen, but maybe the fat pointer just adds extra detail and doesn't change the size.

@oli-obk
Copy link
Contributor Author

oli-obk commented Aug 5, 2020

Unsized locals also happen independently of this RFC. There are no wide pointers involved during the "allocation" of the unsized local. This is the reason I mentioned in the unresolved questions that I need to add a generic parameter for the allocator, because schemes that reallocate need to know about the allocator. In the case of unsized locals, the allocator would just always panic, because you can't reallocate things on the stack.

Going from a sized local to an unsized local does need a vtable genation somewhere, I need to look up how unsized locals work exactly, thanks!

@matthieu-m
Copy link

I find both v-table generation and custom wide-pointers to be interesting, in general, however even after reading the RFC I'm not sure why they are bundled together -- could we have one without the other?

I see two interesting features here:

  • The ability to specify, for a trait, the usage of a custom v-table generator. This could be quite beneficial to more easily bind to C++, for example.
  • The ability to specify the layout of a "wide" pointer... and maybe not making the pointer wide, at all?

I do wonder if the latter feature should be tied to a trait, or tied to an instance. For example, I'd like to be able to write ThinPtr<Trait> with any trait and have that be a pointer to { v-ptr, <struct data> }.

Is there a specific reason to tie the layout of the pointer to the trait?

@oli-obk
Copy link
Contributor Author

oli-obk commented Aug 6, 2020

For example, I'd like to be able to write ThinPtr<Trait> with any trait and have that be a pointer to { v-ptr, <struct data> }.

You don't need any additional features for that. In fact, the Rust compiler already uses this trick for slices and iirc there's a crate that exposes such behaviour for traits.

Essentially you'd implement (not exactly like this, but logically that's what would happen)

impl<T: ?Sized> Deref for ThinPtr<T> {
    type Target = T;
    fn deref(&self) -> &T {
        transmute((&(*self.ptr).struct_data, &(*self.ptr).v_ptr))
    }
}

Although for that to be sound we'd need to expose some code for building &dyn Trait from raw pointers similar to how we have this for slices. I believe this is entirely orthogonal to this RFC though (beyond the fact that exposing the creation of &dyn Trait would need to support arbitrary wide pointers, so we'd need another method on the Unsize trait).

The ability to specify, for a trait, the usage of a custom v-table generator. This could be quite beneficial to more easily bind to C++, for example.

I gave an example for this use case specifically. You could even set it up in a way that you still get a wide pointer, but both the data and vtable ptr of the wide pointer point into the same allocation. Can you elaborate on what part you're missing here?

@matthieu-m
Copy link

Can you elaborate on what part you're missing here?

I don't think anything is missing for this usecase, the benefits seem clear.


I also think I misunderstood the purpose of MyWidePointer; re-reading your answers and the RFC, it's clearer now why the representation of the wide-pointer has to be tied to the representation of the v-table: it's acting as a proxy for the v-table, and a given v-table will necessarily require a dedicated proxy which knows how to fetch the various tidbits (drop pointer, function pointer, etc...).

@oli-obk
Copy link
Contributor Author

oli-obk commented Aug 6, 2020

it's acting as a proxy for the v-table,

Yes, it's not a publicly-visible type, it is the type you're allowed to transmute to and from a &dyn Trait or *const dyn Trait.

@programmerjake
Copy link
Member

change pull request title to spell metadata correctly.

@oli-obk oli-obk changed the title Procedural vtables and wide ptr metdata Procedural vtables and wide ptr metadata Aug 7, 2020
@samsartor
Copy link
Contributor

I'm wondering if it is possible to do away with method_id_to_fn_ptr. The way I see it, the main purpose of that function is to enable automatic impl Trait for dyn Trait but there are lots of reasons why that impl should be written manually. Doing so was a key motivation for RFC 2027. I'd even guess that whenever users pull the custom_wide_pointer escape hatch, they will almost always want a custom Trait for dyn Trait implementation as well.

For example, this RFC could completely solve complication 3 currently blocking async traits by letting the async-trait crate write custom Trait for dyn Trait implementations like:

use std::ptr::DynPointer; // data + vtable
use std::future::FutureVTable;

#[custom_wide_pointer = "__WriterPtr"]
trait Writer {
    type Write<'a>: Future<Output=usize> + 'a;

    fn write(&mut self, bytes: &[u8]) -> Self::Write;
}

type __WriterPtr = DynPointer<__WriterVTable>;

#[repr(C)] // C code can implement Writer!
struct __WriterVTable {
    write: unsafe fn(*mut (), bytes: &[u8]) -> *mut (),
    write_vtable: &'static FutureVTable,
    drop: unsafe fn(*mut ()),
    size_of: usize,
    align_of: usize,
}

impl __WriterVTable {
    fn new<T: Writer>() -> &'static Self {
        static TABLE: Self = __WriterVTable {
            write: |ptr, bytes| {
                Box::into_raw(Box::new(
                    T::write(unsafe { ptr as &mut T })
                )) as *mut ()
            },
            write_vtable: FutureVTable::new::<T::Write<'static>>(),
            drop: |ptr| std::ptr::drop_in_place(ptr as *mut T),
            size_of: std::mem::size_of::<T>(),
            align_of: std::mem::align_of::<T>(),
        };

        &TABLE
    }
}

// Now for the whole point!
impl Writer for dyn Writer {
    type Write<'a> = Box<dyn Future<usize> + 'a>;

    fn write(&mut self, bytes: &[u8]) -> Self::Write {
        let this = self as &mut __WriterPtr; // cast allowed because of custom_wide_pointer
        unsafe {
            Box::from_raw(DynPointer {
                data: (this.vtable.write)(this.data),
                vtable: this.vtable.write_vtable,
            } as *mut _)
        }
    }
}

unsafe impl CustomUnsized for dyn Writer {
    fn size_of(&self) -> usize {
        (self as &__WriterPtr).vtable.size_of
    }

    fn align_of(&self) -> usize {
        (self as &__WriterPtr).vtable.align_of
    }

    // `CustomUnsized: Drop` rather than a `fn drop(&mut self)` here.
}

impl Drop for dyn Writer {
    fn drop(&mut self) -> usize {
        let this = self as &mut __WriterPtr;
        unsafe {
            (this.vtable.drop)(this.data)
        }
    }
}

// This is the hard part. I'm not sure the proposed CustomUnsize makes sense without
// Rust providing some sort of vtable itself. I think it could work like this instead?
unsafe impl<T: Writer> CustomUnsize<__WriterPtr> for T {
    fn from_pointer(ptr: *mut T) -> __WriterPtr {
        __WriterPtr {
            data: unsafe { ptr as *mut () },
            vtable: __WriterVTable::<T>::new(),
        }
    }
}

The key idea there is that #[custom_wide_pointer = P] trait Trait would be able to implement a *mut dyn Trait -> P cast by requiring Trait: CustomUnsize<P>. Or in other words, to set a custom wide pointer, users would always need to impl<T: Trait> CustomUnsize<P> for T.

I think we'd benefit a lot from having CustomUnsize implemented directly on trait implementations instead of on the wide pointer because:

  • It finally gives us an escape hatch for the object safety rules! I bet serde could even use this to provide real dyn Serialize/Deserialize types.
  • It lets us ditch std::vtable::TraitDescription in favor of pulling function pointers off T directly. (see the point above)
  • It lets us specialize method implementations for the dyn case (see write's future-boxing above).
  • It removes the (admittedly cool) fancy const generics.
  • It matches std::marker::Unsize more closely. In fact, we may not need both it and CustomUnsize.

The big question I have is around *mut () vs *const (). In the example above, I used *mut because my trait exclusivly uses &mut but I'm worried we'll need a CustomUnsize and CustomUnsizeMut. But I think that applies to this RFC with or without my version of CustomUnsize?

Anyway, does this alteration seem at all reasonable or am I off my rocker?

@oli-obk
Copy link
Contributor Author

oli-obk commented Aug 8, 2020

The way I see it, the main purpose of that function is to enable automatic impl Trait for dyn Trait

Oh... wow! Let's allow (or require?) this impl for all #[custom_wide_pointer] traits! I love it

@Lokathor
Copy link
Contributor

Lokathor commented Aug 9, 2020

@oli-obk hi, two things

  1. Can I interest you in writing up an issue in the lang-team repo for a change proposal? Because this is going to be a very-discussed RFC and that's what the new change proposal system is supposed to help with.

  2. you said that MMIO could be done using a ZST, but i keep looking at the "Zero sized references to MMIO" example and coming up empty on how it works.

  • First, it says CustomUnsized::WidePointer = ();, but size_of::<&mut ()>() == size_of::<usize>()
  • Next, CustomUnsized::size_of gives 4.
  • So where does the size of the reference itself somehow ever turn to becoming zero?

@oli-obk
Copy link
Contributor Author

oli-obk commented Aug 10, 2020

* First, it says `CustomUnsized::WidePointer = ();`, but `size_of::<&mut ()>() == size_of::<usize>()`

So where does the size of the reference itself somehow ever turn to becoming zero?

The wide pointer is the representation of &dyn ThatTrait, so size_of::<&mut dyn ThatTrait>() == 0 after my RFC, if <dyn ThatTrait as Unsized>::WidePointer = ()

Next, CustomUnsized::size_of gives 4.

That's the size of the thing behind the pointer, so the register bank. It's the value you get for std::mem::size_of_val

@oli-obk
Copy link
Contributor Author

oli-obk commented Aug 10, 2020

1. Because this is going to be a very-discussed RFC and that's what the new change proposal system is supposed to help with.

uhm... so... just like we do compiler MCPs before opening PRs I should have openend a lang MCP before opening an RFC? This should be better communicated (e.g. by modifying the PR template on this repo). There are new RFCs in this repo on a weekly basis and I think noone is aware of this procedure outside the lang team.

@Lokathor
Copy link
Contributor

It is a new process being tested out in just the past few weeks, so there has been no major announcement yet.

However, for a proposal with this large of a scope I expect that a project group will be requested, if this idea is to be developed. Similar to the "safe transmute project group" and so on.

@samsartor
Copy link
Contributor

samsartor commented Aug 11, 2020

This RFC gives custom wide pointers a pretty cool ability: they can encapsulate the data pointer or even remove it entirely. Unfortunately that prevents Rust from subscripting the data pointer like:

let wrapped: Wrapper<MyStruct> = ...;
let object: &Wrapper<dyn MyTrait> = &wrapped;
dbg!(
    object as *const Wrapper<dyn MyTrait>, // (address, custom stuff)
    &object.integer as *const i32, // address
    &object.data as *const dyn MyTrait, // (address + 4, custom stuff)
);

We can't get around this with some kind of manual impl Unsize<Wrapper<dyn MyTrait>> for Unsize<Wrapper<T>> because rust guarantees S<..., T>: Unsize<S<..., U>> for all wrapper structs S. No manual impl can have that sort of generality. And we can't back out of that guarantee because it is used all over the place. Notably for Rc<dyn MyTrait>. We have to give Rust control over the data pointer.

It might be possible to expose the data pointer through a method on CustomUnsized like:

trait CustomUnsized {
    type WithData;

    fn get_data(self) -> Option<*mut ()>;
    fn set_data(self, data: *mut ()) -> Self::WithData;
    ...
}

Even if something like that is realistic to implement, I think it is way overkill. If instead we are willing to give up zero-sized wide pointers we can simply separate the data pointer from the custom pointer "extension" like so:

unsafe impl Unsized for dyn MyTrait {
    // Custom rusty vtable => &'static VTable
    // C++ish vtable => ()
    // Slice => usize
    type Extension = &'static MyTraitVTable;

    fn size_of(&self) -> usize {
        unsafe { std::ptr::get_extension(self).size }
    }

    fn align_of(&self) -> usize {
        unsafe { std::ptr::get_extension(self).align }
    }
}

impl Drop for dyn MyTrait {
    fn drop(&mut self) {
        (std::ptr::get_extension(self).drop)(self as *mut ())
    }
}

unsafe impl<T: MyTrait> Unsize<&'static MyTraitVTable> for T {
    fn extension_of(_: *mut T) -> &'static MyTraitVTable {
        MyTraitVTable::new::<T>()
    }
}

// In std::ptr (or as a compiler intrinsic):
unsafe fn get_extension<T: Unsized>(ptr: *mut T) -> T::Extension {
    let raw: std::raw::WidePointer<T::Extension> = std::mem::transmute(ptr);
    raw.extension
}

I admit I don't understand the purpose of zero-sized wide pointers well enough to comment on the trade-offs here. But as far as I can tell, Rust fundamentally expects all *mut _ to point at a valid allocation whatever additional metadata gets tacked on.

Thoughts?

@tmccombs
Copy link

@samsartor With your proposal, how would one accomplish interoperability with c++, which doesn't use fat pointers for object pointers, but instead embeds a pointer to the vtable (or possible the vtable itself) inside the allocation of the object itself?

@samsartor
Copy link
Contributor

samsartor commented Aug 12, 2020

@tmccombs I imagine C++ vtables working like so:

// Here is a class with virtual method `foo`.
trait MyClass {
    fn new(self) -> Box<dyn MyClass> where Self: Sized {
        // This is an alternative to using impl -> dyn coercions. We can always
        // transmute `*ClassInner` into `*dyn MyClass` manually.
        let ptr = Box::into_raw(Box::new(ClassInner {
            vtable: MyClassVTable::new::<Self>(),
            fields: self,
        }));
        unsafe { Box::from_raw(ptr as *mut dyn MyClass) }
    }

    fn foo(&mut self);
}

// This example obviously doesn't match the Itanium C++ ABI but someday a crate
// should totally proc-macro out a vtable that does.
struct MyClassVTable {
    foo: unsafe fn(*mut ()),
    drop: unsafe fn(*mut ()),
    size: usize,
    align: usize,
}

impl MyClassVTable {
    fn new<T: MyClass>() -> &'static MyClassVTable {
        static TABLE: MyClassVTable = MyClassVTable {
            foo: |ptr| {
                let inner = unsafe { &mut *(ptr as *mut ClassInner<T>) });
                T::foo(&mut inner.fields)
            },
            drop |ptr| unsafe { drop_in_place(ptr as *mut ClassInner<T>) },
            size: size_of::<T>(),
            align: align_of::<T>(),
        };

        &TABLE
    }
}

// C++ stores classes (without inheritance) as vtable pointer, fields. We could
// easily inline the vtable too if we aren't aiming for C++'s exact ABI.
#[repr(C)]
struct ClassInner<T> {
    vtable: &'static MyClassVTable,
    fields: T,
}

impl dyn MyClass {
    // Helper function to access the vtable without knowing the concrete type.
    fn vtable(&self) -> &'static MyClassVTable {
        // This works because the vtable pointer is the first field and is thus
        // directly behind the self pointer. C++ inheritance complicates things
        // but could still work.
        unsafe {
            *(self as *const Self as *const &'static MyClassVTable)
        }
    }
}

// Specify our custom wide-pointer.
unsafe impl Unsized for dyn MyClass {
    // Not a wide pointer at all. Same size as *mut () since the vtable lives in
    // the data allocation.
    type Extension = ();

    fn size_of(&self) -> usize {
        self.vtable().size
    }

    fn align_of(&self) -> usize {
        self.vtable().align
    }
}

impl Drop for dyn MyClass {
    fn drop(&mut self) {
        let drop = self.vtable().drop;
        unsafe { drop(self) }
    }
}

impl MyClass for dyn MyClass {
    fn foo(&mut self) {
        let foo = self.vtable().foo;
        unsafe { foo(self) }
    }
}

// Notice there is no `impl<T: MyClass> Unsize<dyn MyClass> for T`. This is
// because the `dyn MyClass` impls rely on the ClassInner layout behind the data
// pointer, making any `*T` to `*dyn MyClass` coercion unsound. But if Rust is
// able to support arbitrary type -> dyn coercions, we could eliminate the
// manual transmute in `MyClass::new` by specifying:
unsafe impl<T: MyClass> Unsize<dyn MyClass> for ClassInner<T> {
    unsafe fn extension_of(_: *mut ClassInner<T>) -> () { }
}

// Obviously I can't actually run this but I hope it works!
#[test]
fn test_class_dispatch() {
    struct A(Rc<Cell<char>>);

    impl MyClass for A {
        fn foo(&mut self) { self.0.set('A'); }
    }

    struct B(Rc<Cell<char>>);

    impl MyClass for B {
        fn foo(&mut self) { self.0.set('B'); }
    }

    let shared = Rc::new(Cell::new(' '));
    let mut instances = vec![A(shared.clone()).new(), B(shared.clone()).new()];

    instances.iter_mut().for_each(MyClass::foo);
    assert_eq!(shared.get(), 'B');

    instances.reverse();
    instances.iter_mut().for_each(MyClass::foo);
    assert_eq!(shared.get(), 'A');
}

And while I'm spamming enormous amounts of code into a GitHub comment, I might as well demonstrate it for the slice example too:

// Implement only on types that are transmutable into a [Self::Item; len()]
unsafe trait Slice {
    type Item;

    fn len(&self) -> usize;
}

// The [T] primitive type
pub type slice<T> = dyn Slice<Item=T>;

unsafe impl<const N: usize, T> Slice for array<N, T> {
    type Item = T;

    fn len(&self) -> usize {
        N
    }
}

unsafe impl<T> Unsized for slice<T> {
    type Extension = usize;

    fn size_of(&self) -> usize {
        unsafe { get_extension(self) } * size_of::<T>()
    }

    fn align_of(&self) -> usize {
        align_of::<T>()
    }
}

impl<T> Slice for slice<T> {
    type Item = T;

    fn len(&self) -> usize {
        get_extension(self)
    }
}

impl<T> Drop for slice<T> {
    fn drop(&mut self) {
        for ptr in self.as_mut_ptr_range() {
            unsafe { drop_in_place(ptr); }
        }
    }
}

unsafe impl<T: Slice> Unsize<usize> for T {
    unsafe fn extension_of(data: *mut T) -> usize {
        Slice::len(&*data)
    }
}

pub unsafe fn from_raw_parts<'a>(data: *const T, len: usize) -> &'a slice<T> {
    let ptr: *const slice<T> = std::mem::transmute(std::raw::WidePointer {
        data,
        extension: len,
    });
    &*ptr
}

pub unsafe fn from_raw_parts_mut<'a>(data: *mut T, len: usize) -> &'a mut slice<T> {
    let ptr: *mut slice<T> = std::mem::transmute(std::raw::WidePointer {
        data,
        extension: len,
    });
    &mut *ptr
}

impl<T> Deref for Vec<T> {
    type Target = slice<T>;

    fn deref(&self) -> &slice<T> {
        unsafe {
            from_raw_parts(self.as_ptr(), self.len())
        }
    }
}

impl<T> DerefMut for Vec<T> {
    fn deref_mut(&mut self) -> &mut slice<T> {
        unsafe {
            from_raw_parts_mut(self.as_mut_ptr(), self.len())
        }
    }
}

@oli-obk
Copy link
Contributor Author

oli-obk commented Aug 12, 2020

We can't get around this with some kind of manual impl Unsize<Wrapper<dyn MyTrait>> for Unsize<Wrapper<T>> because rust guarantees S<..., T>: Unsize<S<..., U>> for all wrapper structs S. No manual impl can have that sort of generality. And we can't back out of that guarantee because it is used all over the place. Notably for Rc<dyn MyTrait>. We have to give Rust control over the data pointer.

This keeps coming up... This RFC does not affect the unsizing of unsized aggregates that are unsized because their last field is unsized. These will stay compiler magic that just invokes the actual unsized field's unsizing logic. I explained this in an earlier comment: #2967 (comment)

@samsartor
Copy link
Contributor

samsartor commented Aug 12, 2020

@oli-obk Do you mean that Foo<dyn MyTrait> would use the custom wide pointer but Bar<dyn MyTrait> would not (since it doesn't implement CustomUnsize)? I personally think that would potentially cause a lot of foot-gunning. But if the goal is to use the custom wide pointer for all aggregate types, Rust has to be able to fiddle with the data pointer.

And as far as I can tell, Rust also has to be able to re-write the data pointer in order to support the .b field access on Foo<dyn MyTrait>, even with the CustomUnsize impl. Am I missing something?

@oli-obk oli-obk changed the title Procedural vtables and wide ptr metadata Procedural vtables and fully customizeable wide pointers Aug 12, 2020
@oli-obk
Copy link
Contributor Author

oli-obk commented Aug 12, 2020

Oh.. that is a huge oversight on my part... yes. Thanks! I'll update the RFC, we do need new methods similar to what you suggested. I'll try to come up with something.

Copy link
Member

@shepmaster shepmaster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some passing thoughts, but I see you just updated... 🤞

All `impl`s of `MyTrait` will now use `MyWidePointer` for generating the wide pointer.

You are actually generating the wide pointer, not a description of it.
Since your `impl`'s `from` function is being interpreted in the target's environment, all target specific information will match up.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A from function hasn't yet been introduced at this point.

Comment on lines +71 to +73
/// in `Box<T> as Box<dyn Trait>`. This distinction is important, as
/// unsizing that creates a vtable in the same allocation as the object
/// (like C++ does), cannot work on non-owned conversions. You can't just
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// in `Box<T> as Box<dyn Trait>`. This distinction is important, as
/// unsizing that creates a vtable in the same allocation as the object
/// (like C++ does), cannot work on non-owned conversions. You can't just
/// in `Box<T> as Box<dyn Trait>`. This distinction is important, as
/// an unsizing that creates a vtable in the same allocation as the object
/// (like C++ does) cannot work on non-owned conversions because you can't

}
}

/// DISCLAIMER: this uses a `Vtable` struct which is just a part of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// DISCLAIMER: this uses a `Vtable` struct which is just a part of the
/// DISCLAIMER: `ptr.vtable` is a `Vtable` struct which is a part of the

}

/// DISCLAIMER: this uses a `Vtable` struct which is just a part of the
/// default trait objects. Your own trait objects can use any metadata and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow what "default trait objects" means in this context.

@tmccombs
Copy link

    fn new(self) -> Box<dyn MyClass> where Self: Sized {
        // This is an alternative to using impl -> dyn coercions. We can always
        // transmute `*ClassInner` into `*dyn MyClass` manually.
        let ptr = Box::into_raw(Box::new(ClassInner {
            vtable: MyClassVTable::new::<Self>(),
            fields: self,
        }));
        unsafe { Box::from_raw(ptr as *mut dyn MyClass) }
    }

What about Rc<dyn MyClass>, Arc<dyn MyClass>, and &dyn MyClass? I suppose you could make separate constructors for those, but that's a lot of boilerplate, and it doesn't work for any third-party pointer types. OTOH, I'm not sure how &dyn MyClass would work at all.

Notice there is no impl<T: MyClass> Unsize<dyn MyClass> for T

This is necessary with your formulation, but I'm not sure how the compiler would now that it can't use that (currently there is a impl< T: Trait> Unsize<dyn Trait> for T for all traits), and I think it would certainly be surprising to users that this trait wouldn't be able to be used like normal traits, and would be basically impossible to be used with a custom pointer type.

@samsartor
Copy link
Contributor

samsartor commented Aug 13, 2020

@tmccombs The whole reason for this feature is to escape-hatch out of the compiler's current behavior. So although I agree having trait objects which don't automatically coerce is unintuitive for existing Rust users, I don't think there is a fundamental problem with it so long as it can be justified.

The justification in this case is that the coercion is basically a transmute, and we can invent useful scenarios (like the C++-vtable example) were the obvious transmute from impl MyTrait to dyn MyTrait would be unsound due to differing internal layouts.

Edit: Re: how does the compiler know not to automatically write Unsize<dyn Trait> for T where T: Trait? That specific impl doesn't apply to customized wide pointers since for those the programmer must specify extension_of manually. Although the other automatic impls still do.

As for Rc and Arc, I wasn't particularly worried about them in this example because the whole point was to demo how this RFC can improve C++ FFI, and C++ has no idea what Rust's Rc type even is. However, in case we allow coercions beyond the obvious, my example also included an InnerClass<impl MyClass> to dyn MyClass coercion which is sound and can be used with Arc and Rc, even if it is a bit surprising.

Despite all that rationalizing, I am interested to see if oli can come up with something that would let users to customize coercion to do stuff besides a dumb transmute. It could potentially make this example less surprising.

@oli-obk
Copy link
Contributor Author

oli-obk commented Sep 10, 2020

Despite all that rationalizing, I am interested to see if oli can come up with something that would let users to customize coercion to do stuff besides a dumb transmute. It could potentially make this example less surprising.

I wrote up the plain transmute example, so the C++ vtable example can't unsize to the unsized field at all right now if you have a struct Foo { a: i32, b: dyn SomeCppVtableTrait }. This is because the b field is after the a field. Now that's not how c++ works if you consider classes and parents and treat the "take reference of dyn field" action as "c++ upcast".

I'm a bit torn. Introducing something for rewriting the struct layout is definitely sth I want to do, but not in this RFC 😆
The alternative is to figure out a way to allow projecting into the unsized field while still knowing about the vtable, but I don't see that happening in non-owned scenarios at all.

## Zero sized references to MMIO

Instead of having one type per MMIO register bank, we could have one
trait per bank and use a zero sized wide pointer format. There's no `Unsize`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on this? Why do we want a trait per bank instead of simply

pub fn flip_important_bit() {
    std::mem::volatile_write::<bool>(REG_ADDR, !std::mem::volatile_read::<bool>(REG_ADDR));
}

or, if exclusivity is needed, an ordinary mutable reference to zero-sized struct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just one of the design scopes. Right now there are already setups where ppl do this, but they do it with one ZST struct per bank and methods on those. I don't want to get into the merit of this at all, even if I have strong opinions on it. This is just to show that you can do it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's simply a design parameter that significantly complicates the design space, so I think the RFC would be improved if it was strongly justified.

/// unsizings by triggering a compile-time `panic` with an explanation
/// for the user.
unsafe impl<T: MyTrait> const CustomUnsize<Pointer> for T {
fn unsize<const owned: bool>(ptr: *mut T) -> Pointer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still unconvinced that allowing arbitrary stuff (including reallocation) to happen during implicit coercion is a good idea. I know that, at minimum, some procedural action must be taken to construct the extension. But I'd like it if you mentioned in the drawbacks that implicit coercions are no longer simple transmutes/bitwise-copies unless we replace this function with an associated const.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I did not appreciate how quickly this can go crazy. I mean, my first design (unpublished) was to emit a value of a struct that describes the pointer layout and how to extract the pointer. Then I realized that I'm just inventing a bad way to write code and went full "just write your own pointer creation and management", but as noted in the unresolved question section: I don't know if we really want this level of control at all.

I mean we could reduce the capabilities by forcing unsize to be const fn, but that still means that in the future you'll be able to allocate memory and stuff..

So... one weird idea I had was to make it a const fn, and then invoke the function in CTFE with a symbolic pointer. The resulting struct is then analyzed at compile-time and a transformation function is generated that will replicate the result memory (of the wide pointer) but replace all occurences of the symbolic pointer with the pointer we're unsizing.

This would severely limit what kind of transformations you can do, while making it trivial for the compiler to figure out the replication strategy (raw copy all bits before the symbolic pointer, write the unsize pointer, raw copy all bits after the symbolic pointer). In case the symbolic pointer was offset, apply said offset.

Copy link
Contributor

@samsartor samsartor Sep 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. This is really interesting. Yah, you could use some totally opaque type as a placeholder for the data pointer (maybe an extern type?) to be swapped out at coercion time. I think that would also solve the issues with projection because Rust could to do whatever it wants with that particular field?

I'll have to think about it some more but I like that direction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't actually need a type, we can create a dangling symbolic pointer (this is a value in mir interpreter!) and just use *const T for whatever T we desire. Any interaction with the pointer beyond copying it around will make CTFE complain to you at compile-time. Since this is a special case, we can actually not just tell you about "dangling pointer" but special case the diagnostic to be really helpful

vtable: &'static VTable,
}

/// For going from a `&SomeStruct<dyn Trait>` to a field of `SomeStruct`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These might be better as defaulted methods in Unsized?

This is how I see all extern types being handled.
There can be no impls of `CStr` for any type, because the `Unsize`
trait impl is missing. See the future extension section at the end of this RFC for
ideas that could support `CString` -> `CStr` unsizing by allowing `CString` to implement
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this parallels std::String, it would actually do CString -> &CStr via Deref just like it does today. The reason is that a *CString doesn't point to the data, it points to a pointer+len+capacity which points to the data.

I think real advantage is that some sized, no-indirection type on the stack like CStrArray<10> could coerce to a CStr without slice-pointer-transmute shenanigans?


// This impl must be in the `vec` module, to give it access to the `vec`
// internals instead of going through `&mut Vec<T>` or `&Vec<T>`.
impl<T> CustomUnsize<SlicePtr> for Vec<T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorta like the CStr example, Vec uses Deref. I think this should be implemented on [T; N] instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see what the advantage is beyond having a simpler implementation. We can't reduce the complexity of the trait system described in this RFC if we limit ourselves to simple to convert types.

Copy link
Contributor

@samsartor samsartor Sep 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just the semantics of Vec. Vec is not a slice, it manages a slice. This is clear if you think about the implications of a &Struct<Vec<T>> to &Struct<[T]> coercion.
I suppose it isn't a big deal. I'd just like to see a use case where there is no simple work around to this RFC via Deref (e.g. [T; N]).

}
```

## Zero sized references to MMIO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this example, what happens if I do this?

// Reads the data behind `r` as bytes. Sound if `T` has no padding between fields.
unsafe fn read_memory_behind<T>(r: &T) -> Vec<u8> {
    let len = std::mem::size_of_val(r) as isize;
    let ptr = r as *const T as *const u8;
    (0..len).map(|o| *ptr.offset(o)).collect()
}

let bank: &dyn MyRegisterBank = ...;
read_memory_behind(bank); // WTF

The natural assumption is that this should output the 4 bytes in the register bank. But it doesn't necessarily because despite there being 4 bytes behind r according to the Unsize impl, r doesn't actually point to anything. What is the value of ptr? Is it undefined?

The best solution I can come up with is for as *const u8 to call project(0) but if that is the case, it should probably be documented here.

}
```

## Slicing into matrices via the `Index` trait
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is such a cool example.

[rationale-and-alternatives]: #rationale-and-alternatives

This is a frequently requested feature and as a side effect obsoletes `extern type`s which have all kinds of problems (like the inability to invoke `size_of_val` on pointers to them).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider mentioning that we could instead let the language handle just the data pointer, ridding ourselves of the project and unsize functions at the cost of not controlling the entire pointer layout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case we'd go with any of the other RFCs ;) Which is something I do list further down.

/// move away the owned object. The flag allows you to forbid such
/// unsizings by triggering a compile-time `panic` with an explanation
/// for the user.
unsafe impl<T: MyTrait> const CustomUnsize<Pointer> for T {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might just be the Unsize trait.

Comment on lines +223 to +227
let mut data_ptr = wide_ptr.ptr;
for i in 0..wide_ptr.len {
std::ptr::drop_in_place(data_ptr);
data_ptr = data_ptr.offset(1);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut data_ptr = wide_ptr.ptr;
for i in 0..wide_ptr.len {
std::ptr::drop_in_place(data_ptr);
data_ptr = data_ptr.offset(1);
}
let mut data_ptr = wide_ptr.ptr as *mut T;
for i in 0..wide_ptr.len {
std::ptr::drop_in_place(data_ptr.offset(i));
}

@oli-obk
Copy link
Contributor Author

oli-obk commented Oct 9, 2020

So... this RFC is inherently incompatible with having multiple unsized types within a struct (instead of one). We'd need a riddicously complex scheme for combining different wide pointers into one, which is not worth it. Being able to unsize &[[T; N]; M] to &[[T]] is not something I want to give up

@oli-obk oli-obk closed this Oct 9, 2020
@samsartor
Copy link
Contributor

@oli-obk I'm a little confused by this. As far as I can tell, Rust doesn't support (or plan to support) having multiple unsized types within a struct. It is an error to unsize &[[T; N]; M] to &[[T]].

@tmccombs
Copy link

How would &[[T]] even work? How would you have a slice where each element can have a different size? Or is &[[T]] constrained to have to be a rectangular array, and is represented by a double-fat pointer (a pointer and two usizes, for the two dimensions). And if so, how far does it go? would you be able to have a &[[[[[[[[[[T]]]]]]]]]] that requires a *T and ten usizes to represent it?

@golddranks
Copy link

I would imagine that &[[T]] is a slice where each of the [T] has the same size, but that size is not known until runtime, and the [T]s would be laid out inline in the containing slice, which would have the metadata of the size of [T].

@tmccombs
Copy link

@oli-obk could you explain what the inherent incompatibility is? It isn't obvious to me.

@oli-obk
Copy link
Contributor Author

oli-obk commented Oct 14, 2020

A scheme that aggregates pointer metadata cannot mutate the pointer in arbitrary ways (e.g make it zero sized), as the other metadatas may require access to the pointer itself

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.