Skip to content

Support migrations in sort. #1131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 5, 2021
Merged

Conversation

jeromekelleher
Copy link
Member

@jeromekelleher jeromekelleher commented Jan 23, 2021

Closes #22
Closes #117

  • Tests that fully cover new/changed functionality.
  • Documentation including tutorial content if appropriate.
  • Changelogs, if there are API changes.

(I've not done any docs because I can't see anywhere we've said this stuff wasn't supported)

@jeromekelleher
Copy link
Member Author

@petrelharp I've added sort support for migrations here. What do you think of the sort key? Should we make the sort requirement for migrations stricter now? We'd need to update msprime to make sure things are output in this order, I think.

@codecov
Copy link

codecov bot commented Jan 23, 2021

Codecov Report

Merging #1131 (3382b83) into main (e87e40a) will decrease coverage by 0.00%.
The diff coverage is 95.23%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1131      +/-   ##
==========================================
- Coverage   93.70%   93.69%   -0.01%     
==========================================
  Files          26       26              
  Lines       21404    21460      +56     
  Branches      902      902              
==========================================
+ Hits        20056    20108      +52     
- Misses       1310     1314       +4     
  Partials       38       38              
Flag Coverage Δ
c-tests 92.46% <95.23%> (+<0.01%) ⬆️
lwt-tests 92.97% <ø> (ø)
python-c-tests 94.90% <ø> (ø)
python-tests 98.63% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
python/tskit/tables.py 99.60% <ø> (ø)
c/tskit/tables.c 90.75% <95.23%> (+0.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e87e40a...3382b83. Read the comment docs.

Copy link
Member

@benjeffery benjeffery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one question on coverage.

@petrelharp
Copy link
Contributor

I vote to not add more sortedness requirements if we don't have a good algorithmic reason to. Plus, it's not obvious to me that source and dest are the natural next things to sort on - I would have thought node would come next? Or maybe left? So, I'd say we should just sort on time here, and add some more requirements if it turns out to be helpful.

@jeromekelleher
Copy link
Member Author

Well, we need to choose some keys to sort on to make things deterministic (which we definitely want, we had loads of problems in the past with different C implementations having slightly different sort orders). I chose the current ordering because that's what comes out of msprime. So, my vote would be to formalise the order that pops out naturally from msprime (starting here - sort by (source, dest, left). Unless there's a good reason otherwise, we might as well go with an existing ordering.

Agreed on not adding any more sortedness requirements to the input, that seems like it's more trouble that it's worth.

@petrelharp
Copy link
Contributor

I'm a bit confused still: by "deterministic" you mean specifying a total order, right? Don't we need some more keys for that, since you could have more than one migration event with the same time, source, dest?

@jeromekelleher
Copy link
Member Author

I'm a bit confused still: by "deterministic" you mean specifying a total order, right? Don't we need some more keys for that, since you could have more than one migration event with the same time, source, dest

Yeah, I mean a fully specified ordering, because the output of sort isn't deterministic otherwise. Ah, you're right (time, source, dest, left) isn't fully specified, we'd need (time, source, dest, left, node) because we could have multiple nodes migrating at once.

@petrelharp
Copy link
Contributor

Sure, sounds good. I guess that sorting of mutations can be skipped with the bookmark if it's a performance problem. So, sort will make migrations sorted on all these things, but "valid tree sequence requirements" will still only require sortedness on time, right?

@jeromekelleher
Copy link
Member Author

Yep, agreed

@jeromekelleher
Copy link
Member Author

We should probably try to get #1108 in first, since @petrelharp has had to do a lot of messing around with it already. Unless you'd prefer to get this in first Peter? The updates I need to do are fairly small.

@petrelharp
Copy link
Contributor

I would love to not have to do more mucking around with #1108, yes.

@benjeffery
Copy link
Member

Is this ready for me to have a look over?

@jeromekelleher
Copy link
Member Author

No, not yet. Thanks for the ping though, will do ASAP. (Writing slides for a talk tomorrow atm!)

@jeromekelleher
Copy link
Member Author

Ready for final review and merge here @benjeffery

Copy link
Member

@benjeffery benjeffery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff, annoyingly I can't find anything to comment on!

@mergify mergify bot merged commit d0fbaa7 into tskit-dev:main Feb 5, 2021
@jeromekelleher jeromekelleher deleted the sort-migrations branch February 5, 2021 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sort should support sorting migrations with empty suffix sort_tables should sort migrations
4 participants