Skip to content

Add marker traits for table-specific metadata. #158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 29, 2021

Conversation

molpopgen
Copy link
Member

@molpopgen molpopgen commented Aug 27, 2021

Currently, metadata is a single trait. When one thinks of external tools exporting to tskit, one starts to worry that loads of Vec<T: tskit::metadata::MetadataRoundTrip> accumulate, leading to the possibility that the wrong metadata will be sent to the wrong table.

This PR is an attempt to add safety to the traits, meaning that a mutation table row can only get "mutation meta data", which client code must specify via a marker trait.

It seems that we can also provide little macros like register_mutation_metadata!(...) to streamline this a bit.

@molpopgen molpopgen linked an issue Aug 27, 2021 that may be closed by this pull request
@molpopgen
Copy link
Member Author

cc @MomoLangenstein -- it'd be nice to see your thoughts on this. It is a minimal-ish API change, and it may affect how you are exporting your data. But it seems that the next release is about "type safety", which led me to this.

@juntyr
Copy link
Contributor

juntyr commented Aug 28, 2021

cc @MomoLangenstein -- it'd be nice to see your thoughts on this. It is a minimal-ish API change, and it may affect how you are exporting your data. But it seems that the next release is about "type safety", which led me to this.

This looks like a good change to me as well. Do you plan on adding marker traits for the other metadata types as well?

I also noticed that you swapped out dynamic dispatch for generics in one location - what do you think about doing that for the entire API as well? Use generics is … well more generic and more expressive … and you can then still express the dynamic dispatch case in that.

@molpopgen
Copy link
Member Author

molpopgen commented Aug 28, 2021 via email

@molpopgen
Copy link
Member Author

The change to a generic for the current diff involves a private class, and was to enable absorbing all the new trait markers. But it won't break API to make all the other functions generics as well, so may as well do it in this pull request.

@molpopgen
Copy link
Member Author

@MomoLangenstein -- I now realized the issue with generics vs dyn. Having a function accept Option<& dyn Something> works, but Option<&M> does not, failing to compile when None is passed in. We really don't want metadata to be passed in as non-&, as it may be arbitrarily expensive to clone. It also feels hackish to supply a dummy type to satisfy the trait bounds to accommodate None.

@juntyr
Copy link
Contributor

juntyr commented Aug 28, 2021

@MomoLangenstein -- I now realized the issue with generics vs dyn. Having a function accept Option<& dyn Something> works, but Option<&M> does not, failing to compile when None is passed in. We really don't want metadata to be passed in as non-&, as it may be arbitrarily expensive to clone. It also feels hackish to supply a dummy type to satisfy the trait bounds to accommodate None.

Ah, I see where that compilation error comes from. Have you already merged the API changes to provide methods which accept no metadata or metadata unconditionally? In that case, there are methods that always work for the user and if they want to supply an option, then it’s their responsibility to type it (since they go down the explicit route, they should pass Some in some case, meaning the Option will be typed. In this case, you could then implement the optional API using the two explicit cases (I.e. switch it around) to avoid an internal compilation failure.

Another option would be to give M the default value () and implement the metadata trait for it (which would generate empty metadata).

The reason why I’d personally still prefer generics is that they are more Rusty and provide the compiler with better optimisation potential.

@molpopgen
Copy link
Member Author

Yes, the _with_some_metadata variant is merged. To make the generic version work, I need more code duplication. Right now, all variants of an "add row" function work by calling one of them. To have the different versions work as you suggest, 2 of the 3 at least would need to make unsafe calls to the C API.

I'll think more on this.

@molpopgen
Copy link
Member Author

In this case, you could then implement the optional API using the two explicit cases (I.e. switch it around) to avoid an internal compilation failure.

Aha--this is the key....

@juntyr
Copy link
Contributor

juntyr commented Aug 28, 2021

Is there a C++ version of the tskit API? In that case, you could use cxx to generate safe Rust calls to the API to avoid having to use unsafe code (though I’m not one of the people who scream in fear every time unsafe pops up - instead I just put a comment on top of every unsafe block to explain why it’s actually safe)

@molpopgen
Copy link
Member Author

Is there a C++ version of the tskit API? In that case, you could use cxx to generate safe Rust calls to the API to avoid having to use unsafe code (though I’m not one of the people who scream in fear every time unsafe pops up - instead I just put a comment on top of every unsafe block to explain why it’s actually safe)

No, the canonical API is C, and always will be. The unsafe bit here is that None for metadata means NULL/nullptr in C/C++. So, it is easy to deal with, and safe enough. More generally, the C API does a lot with bare pointers. It's all quite normal, and well-implemented on the C side. It just invokes fear in rustc, of course.

@molpopgen
Copy link
Member Author

molpopgen commented Aug 28, 2021

The most recent commit changes to generics (for one of the tables). It is now a bit funky, maybe, as the back-end private struct still expects an Option, but that can be fiddled with if necessary.

@juntyr
Copy link
Contributor

juntyr commented Aug 28, 2021

That change for one of the tables looks good to me

@molpopgen
Copy link
Member Author

That change for one of the tables looks good to me

Thanks. I'll sort the rest out on Monday, I hope.

@daniel-goldstein
Copy link
Member

Thanks for the ping on this, very interesting! I share the same sentiments about generics where possible, and don't see it as much of a bad thing to have two unsafe calls out to the C implementation. I like that you no longer have create an Option when you don't need one. Maybe it's since I haven't used this API, but it's unclear to me when you wouldn't statically know which of the explicit cases (no Option) you would use.

It is now a bit funky, maybe, as the back-end private struct still expects an Option, but that can be fiddled with if necessary.

Are you saying it would be feasible for EncodedMetadata to not hold an optional? That does seem simpler in this world.

@molpopgen
Copy link
Member Author

Thanks for the ping on this, very interesting! I share the same sentiments about generics where possible, and don't see it as much of a bad thing to have two unsafe calls out to the C implementation. I like that you no longer have create an Option when you don't need one. Maybe it's since I haven't used this API, but it's unclear to me when you wouldn't statically know which of the explicit cases (no Option) you would use.

It is now a bit funky, maybe, as the back-end private struct still expects an Option, but that can be fiddled with if necessary.

Are you saying it would be feasible for EncodedMetadata to not hold an optional? That does seem simpler in this world.

The option exists because metadata is optional. But you are right that you should know statically which function to call. The encoded metadata class is not visible to client code, so the option is harmless for the most part, but can (and probably should) be removed.

@molpopgen
Copy link
Member Author

I've eliminated the option entirely. I'm liking this mich more.

@juntyr
Copy link
Contributor

juntyr commented Aug 29, 2021

It looks great - apart from:

if self.encoded.is_empty() {
   0
} else {
   self.encoded.len() as tsk_size_t
}

@molpopgen
Copy link
Member Author

It looks great - apart from:

if self.encoded.is_empty() {
   0
} else {
   self.encoded.len() as tsk_size_t
}

Yeah, oops! I'll fix that.

* Add per-table metadata trait marker.
* Remove use of Option<metadata> from the API

Closes #157
@molpopgen molpopgen force-pushed the metadata_marker_traits branch from ecdcf1b to 734d93f Compare August 29, 2021 18:33
@molpopgen molpopgen merged commit fafd457 into main Aug 29, 2021
@molpopgen molpopgen deleted the metadata_marker_traits branch August 29, 2021 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Need marker traits for metadata.
3 participants