Skip to content

Commit b920703

Browse files
authored
doc: book sections on metadata (#391)
1 parent 244d3f6 commit b920703

File tree

6 files changed

+178
-0
lines changed

6 files changed

+178
-0
lines changed

book/src/SUMMARY.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,12 @@
1515
- [Working with trees](./tree_sequence_tree.md)
1616
- [Miscellaneous operations](./tree_sequence_miscellaneous.md)
1717

18+
* [Metadata](./metadata.md)
19+
- [Defining metadata types in rust](./metadata_derive.md)
20+
- [Metadata and tables](./metadata_tables.md)
21+
- [Metadata schema](./metadata_schema.md)
22+
23+
1824
[Crate prelude](./prelude.md)
1925
[Changelog](./changelog.md)
2026
[Migration Guide](./migration_guide.md)

book/src/metadata.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# Metadata <img align="right" width="73" height="45" src="https://raw.githubusercontent.com/tskit-dev/administrative/main/logos/svg/tskit-rust/Tskit_rust_logo.eps.svg">
2+
3+
Tables may contain additional information about rows that is not part of the data model.
4+
This metadata is optional.
5+
Tables are not required to have metadata.
6+
Tables with metadata do not require that every row has metadata.
7+
8+
The next sections showcase the metadata API.

book/src/metadata_derive.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# Defining metadata types in rust
2+
3+
A key feature of the API is that metadata is specified on a per-table basis.
4+
In other words, a type to be used as node metadata implements the `tskit::metadata::NodeMetadata` trait.
5+
6+
Using the `tskit` cargo feature `derive`, we can use procedural macros to define metadata types.
7+
Here, we define a metadata type for a mutation table:
8+
9+
```rust, noplayground, ignore
10+
{{#include ../../tests/book_metadata.rs:metadata_derive}}
11+
```
12+
13+
We require that you also manually specify the `serde` derive macros because the metadata API
14+
itself does not depend on `serde`.
15+
Rather, it expects raw bytes and `serde` happens to be a good way to get them from your data types.
16+
17+
The derive macro also enforces some helpful behavior at compile time.
18+
You will get a compile-time error if you try to derive two different metadata types for the same rust type.
19+
The error is due to conflicting implementations for a [supertrait](https://doc.rust-lang.org/rust-by-example/trait/supertraits.html).

book/src/metadata_schema.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Metadata schema
2+
3+
For useful data interchange with `tskit-python`, we need to define [metadata schema](https://tskit.dev/tskit/docs/stable/metadata.html).
4+
5+
There are currently several points slowing down a rust API for schema:
6+
7+
* It is not clear which `serde` formats are compatible with metadata on the Python side.
8+
* Experiments have shown that `serde_json` works with `tskit-python`.
9+
* Ideally, we would also like a binary format compatible with the Python `struct`
10+
module.
11+
* However, we have not found a solution eliminating the need to manually write the
12+
schema as a string and add it to the tables.
13+
Various crates to generate JSON schema from rust structs return schema that are over-specified
14+
and fail to validate in `tskit-python`.
15+
* We also have the problem that we will need to add some Python to our CI to prove to ourselves
16+
that some reasonable tests can pass.
17+

book/src/metadata_tables.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# Metadata and tables
2+
3+
Let us create a table and add a row with our mutation metadata:
4+
5+
```rust, noplayground, ignore
6+
{{#include ../../tests/book_metadata.rs:add_mutation_table_row_with_metadata}}
7+
```
8+
9+
Meta data is optional on a per-row basis:
10+
11+
```rust, noplayground, ignore
12+
{{#include ../../tests/book_metadata.rs:add_mutation_table_row_without_metadata}}
13+
```
14+
15+
We can confirm that we have one row with, and one without, metadata:
16+
17+
```rust, noplayground, ignore
18+
{{#include ../../tests/book_metadata.rs:validate_metadata_row_contents}}
19+
```
20+
21+
Fetching our metadata from the table requires specifying the metadata type.
22+
The result of a metadata retrieval is `Option<Result, TskitError>`.
23+
The `None` variant occurs if a row does not have metadata or if a row id does not exist.
24+
The error state occurs if decoding raw bytes into the metadata type fails.
25+
The details of the error variant are [here](https://docs.rs/tskit/latest/tskit/error/enum.TskitError.html#variant.MetadataError).
26+
The reason why the error type holds `Box<dyn Error>` is that the API is very general.
27+
We assume nothing about the API used to encode/decode metadata.
28+
Therefore, the error could be anything.
29+
30+
```rust, noplayground, ignore
31+
{{#include ../../tests/book_metadata.rs:metadata_retrieval}}
32+
```
33+
34+
```rust, noplayground, ignore
35+
{{#include ../../tests/book_metadata.rs:metadata_retrieval_none}}
36+
```

tests/book_metadata.rs

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
#[cfg(feature = "derive")]
2+
#[test]
3+
fn book_mutation_metadata() {
4+
// ANCHOR: metadata_derive
5+
#[derive(serde::Serialize, serde::Deserialize, tskit::metadata::MutationMetadata)]
6+
#[serializer("serde_json")]
7+
struct MutationMetadata {
8+
effect_size: f64,
9+
dominance: f64,
10+
}
11+
// ANCHOR_END: metadata_derive
12+
13+
// ANCHOR: add_mutation_table_row_with_metadata
14+
let mut tables = tskit::TableCollection::new(50.0).unwrap();
15+
16+
let md = MutationMetadata {
17+
effect_size: 1e-3,
18+
dominance: 1.0,
19+
};
20+
21+
let mut_id_0 = tables
22+
.add_mutation_with_metadata(
23+
0, // site id
24+
0, // node id
25+
-1, // mutation parent id
26+
0.0, // time
27+
None, // derived state is Option<&[u8]>
28+
&md, // metadata for this row
29+
)
30+
.unwrap();
31+
// ANCHOR_END: add_mutation_table_row_with_metadata
32+
33+
// ANCHOR: add_mutation_table_row_without_metadata
34+
let mut_id_1 = tables
35+
.add_mutation(
36+
0, // site id
37+
0, // node id
38+
-1, // mutation parent id
39+
0.0, // time
40+
None, // derived state is Option<&[u8]>
41+
)
42+
.unwrap();
43+
// ANCHOR_END: add_mutation_table_row_without_metadata
44+
45+
// ANCHOR: validate_metadata_row_contents
46+
assert_eq!(
47+
tables
48+
.mutations_iter()
49+
.filter(|m| m.metadata.is_some())
50+
.count(),
51+
1
52+
);
53+
assert_eq!(
54+
tables
55+
.mutations_iter()
56+
.filter(|m| m.metadata.is_none())
57+
.count(),
58+
1
59+
);
60+
// ANCHOR_END: validate_metadata_row_contents
61+
62+
// ANCHOR: metadata_retrieval
63+
let fetched_md = match tables.mutations().metadata::<MutationMetadata>(mut_id_0) {
64+
Some(Ok(m)) => m,
65+
Some(Err(e)) => panic!("metadata decoding failed: {:?}", e),
66+
None => panic!(
67+
"hmmm...row {} should have been a valid row with metadata...",
68+
mut_id_0
69+
),
70+
};
71+
72+
assert_eq!(md.effect_size, fetched_md.effect_size);
73+
assert_eq!(md.dominance, fetched_md.dominance);
74+
// ANCHOR_END: metadata_retrieval
75+
76+
// ANCHOR: metadata_retrieval_none
77+
// There is no metadata at row 1, so
78+
// you get None back
79+
assert!(tables
80+
.mutations()
81+
.metadata::<MutationMetadata>(mut_id_1)
82+
.is_none());
83+
84+
// There is also no metadata at row 2,
85+
// because that row does not exist, so
86+
// you get None back
87+
assert!(tables
88+
.mutations()
89+
.metadata::<MutationMetadata>(2.into())
90+
.is_none());
91+
// ANCHOR_END: metadata_retrieval_none
92+
}

0 commit comments

Comments
 (0)