Skip to content

Writing fasta files from tree sequences #350

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 2, 2019

Conversation

maz-mckellar
Copy link
Member

I've written some code to cover integrating fasta output from tree sequences into tskit, following from issue #338 and have included various tests for it.
Happy to hear feedback on this (e.g. @jeromekelleher, @hyanwong).

@jeromekelleher
Copy link
Member

Great stuff, thanks @marianne-aspbury. How about we go through this in person tomorrow?

@maz-mckellar
Copy link
Member Author

Yep, sounds good, thanks

@codecov
Copy link

codecov bot commented Aug 29, 2019

Codecov Report

Merging #350 into master will increase coverage by 1.11%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #350      +/-   ##
==========================================
+ Coverage   86.48%   87.59%   +1.11%     
==========================================
  Files          20       19       -1     
  Lines       14024    10275    -3749     
  Branches     2740     1890     -850     
==========================================
- Hits        12128     9000    -3128     
+ Misses        977      756     -221     
+ Partials      919      519     -400
Flag Coverage Δ
#c_tests 87.59% <100%> (+0.02%) ⬆️
#python_c_tests ?
#python_tests 99.24% <100%> (ø) ⬆️
Impacted Files Coverage Δ
python/tskit/cli.py 97.84% <100%> (+0.11%) ⬆️
python/tskit/trees.py 98.68% <100%> (+0.01%) ⬆️
python/_tskitmodule.c

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 74461f4...a8bddcc. Read the comment docs.

@maz-mckellar maz-mckellar force-pushed the fasta_file_output branch 2 times, most recently from 102e896 to e102e28 Compare August 30, 2019 12:26
Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks @marianne-aspbury. We're basically there I think.

It's probably as well to look at the CLI as well here We basically want to duplicate the VCF option in tskit/cli.py and tests/test_cli.py; give me a shout if you have questions.

@maz-mckellar
Copy link
Member Author

Added functionality for printing fastas on command line, pretty much copied directly from the vcf CLI code and all its tests.

@maz-mckellar maz-mckellar force-pushed the fasta_file_output branch 5 times, most recently from 8db52b8 to d596ea1 Compare August 30, 2019 14:22
@hyanwong
Copy link
Member

Sorry - I've not been following this. Note my reservations in #326 (comment) about making sure that FASTA files (and haplotypes in general) output non-variable sites, at a minimum with a dot, but possibly filling in via a reference genome.

@petrelharp
Copy link
Contributor

I don't think we're ready to deal with invariant sites yet, @hyanwong - to do that we need to distinguish tree sequences with integer positions and a reference sequence from the current infinite-sites notions. I'm in favor of doing this, but it shouldn't get in the way of this effort.

@jeromekelleher
Copy link
Member

I don't think we're ready to deal with invariant sites yet, @hyanwong - to do that we need to distinguish tree sequences with integer positions and a reference sequence from the current infinite-sites notions. I'm in favor of doing this, but it shouldn't get in the way of this effort.

Agreed. Let's consider what we can do as a stopgap for finite sites as part of #326 (without changing the default behaviour; we don't break people's code). We can then decide what the final semantics of the fasta function should be in #353 (tagged as a 0.2.3 release issue so we don't forget about it).

Let's get this PR finished up and merge first though.

@hyanwong
Copy link
Member

hyanwong commented Sep 1, 2019

I don't think we're ready to deal with invariant sites yet, @hyanwong

We've constructed tree sequences with a site at each integer position, but many sites without mutations. Isn't that the same as an invariant site? But I guess this might be though of as a bit of a hack.

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks @marianne-aspbury! Some very minor comments above --- we're ready to merge after these are addressed and the branch is rebased to bring it up to date.

@maz-mckellar
Copy link
Member Author

Looks great, thanks @marianne-aspbury! Some very minor comments above --- we're ready to merge after these are addressed and the branch is rebased to bring it up to date.

Thanks @jeromekelleher, I've fixed these now

@jeromekelleher jeromekelleher merged commit 9a266de into tskit-dev:master Sep 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants