Skip to content

Outputting fasta files from tree sequences #338

@maz-mckellar

Description

@maz-mckellar

From msprime created by marianne-aspbury: tskit-dev/msprime#802

This is a pretty basic file format to output, I have something that works for me but I don't know how to integrate it within tskit, and that'd be nice to do so noting it here for the future. (I'm also happy to do this, but with some direction).
This is what I'm doing for my personal use at the moment:

# the actual haplotype strings
haps = []
for i in ts.haplotypes():
    haps.append(i)

# The ">ID" parts of the fasta format. Don't know what best info to include is, 
# I'm just using sample index and the population because that's relevant for me. 
# Possibly people could choose what to include for ID strings in a fasta output function call
sequence_IDs = []
for i in range(len(haps)):
    sequence_IDs.append(f'sample_{ts.samples()[i]}_pop_{ts.node(i).population}')

# saving the file
with open('Sim_fasta_file.txt', 'w') as f:
    for i in range(len(haps)):
        f.write(f'>{sequence_IDs[i]}\n{haps[i]}\n') 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions