Skip to content

Convert tsk_id_t/tsk_size_t to 64 bit.  #343

@jeromekelleher

Description

@jeromekelleher

It seems that requiring metadata columns to being < 4G is an unwelcome limitation in forward simulation applications, so we should consider upgrading the metadata_offset columns (and other offset columns) to 64 bit integers. Here is what it would entail:

  1. Create a tsk_offset_t typedef and go through the C tables API making sure this is used for all offset columns (metadata_offset, ancestral_state_offset, etc). Typedef this to uint64_t.
  2. Increment the file-format minor version. In the file writing code, look at the last value in each offset array. If it's < UINT32_MAX, store it as a 32 bit value; if not, store as a 64 bit. In the reading code, check which type is being used to store and update accordingly. Note that this means we don't be able to use the current zero-copy behaviour where we use the memory in the kastore to back the arrays. Figure out the best approach here (note, using the memory in the kastore was motivated by using mmap for io, which we've dropped now as it's inherently dangerous and hard to do properly cross-platform).
  3. Clean up _tskitmodule.c to use the correct new sizes for numpy arrays (possibly also needing to backport this over to msprime where we use the LightweightTableCollection for interchange).

From a user perspective, this will mean that old versions of tskit won't be able to read newer files. New versions of tskit will continue to read older files without issues, as we're making the code more flexible in terms of expected types.

pinging @petrelharp and @molpopgen for opinions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    C APIIssue is about the C API

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions