Skip to content

Add the skip_reference_sequence and ignore_reference_sequence options #1971

@jeromekelleher

Description

@jeromekelleher

We recently added the concept of "table data" in 82a56e7 with the addition of the skip_tables flag to tskit.load() and the ignore_tables flag to TableCollection.equals() (and the corresponding flags to the C API). Since that change was made we also in parallel added basic support for reference sequence data. As @bhaller points out (#1854 (comment)) the skip_tables option loads the reference sequence data.

The skip_tables option was initially motivated by the desire to get access to the top-level metadata only (#1854). Providing access only to the metadata is a non-starter I think, because it's much simpler to skip loading stuff into the table collection that it is to provide separate APIs for accessing the metadata. So, there will always be some extra info that comes with the metadata, and this is correct I think: what if I was going through a bunch of files just to read their uuid values? This isn't metadata, and I wouldn't want to read the whole file just to get them either.

The question then is what we do from this point. Since we want the option of not loading reference sequence data, the options as I see it are:

  1. Add similar flags like skip_reference_sequence and ignore_reference_sequence to load and equals
  2. Regard reference_sequence as table data, and document as such
  3. Rename the skip_tables and ignore_tables flags to something like top_level_only and be clear that we don't consider reference_sequence as top level data.

Any thoughts @bhaller @clwgg?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions