-
Notifications
You must be signed in to change notification settings - Fork 77
Description
We recently added the concept of "table data" in 82a56e7 with the addition of the skip_tables
flag to tskit.load() and the ignore_tables
flag to TableCollection.equals()
(and the corresponding flags to the C API). Since that change was made we also in parallel added basic support for reference sequence data. As @bhaller points out (#1854 (comment)) the skip_tables option loads the reference sequence data.
The skip_tables option was initially motivated by the desire to get access to the top-level metadata only (#1854). Providing access only to the metadata is a non-starter I think, because it's much simpler to skip loading stuff into the table collection that it is to provide separate APIs for accessing the metadata. So, there will always be some extra info that comes with the metadata, and this is correct I think: what if I was going through a bunch of files just to read their uuid
values? This isn't metadata, and I wouldn't want to read the whole file just to get them either.
The question then is what we do from this point. Since we want the option of not loading reference sequence data, the options as I see it are:
- Add similar flags like
skip_reference_sequence
andignore_reference_sequence
toload
andequals
- Regard
reference_sequence
astable
data, and document as such - Rename the
skip_tables
andignore_tables
flags to something liketop_level_only
and be clear that we don't considerreference_sequence
as top level data.