Skip to content

IBD result struct #1637

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 6, 2021
Merged

Conversation

jeromekelleher
Copy link
Member

This is stacked on #1618 so will be very noisy until that's merged

This is some initial work on refactoring the IBD code as we discussed @gtsambos. The idea is that we encapsulate all the IBD segment result information in this tsk_ibd_result_t struct, which we make available to the caller. All the details about how we store the segments for a pair of samples are hidden away, which gives us flexibility to do various things in the future.

Still some work to be done to really make use of this, though.

@codecov
Copy link

codecov bot commented Aug 19, 2021

Codecov Report

Merging #1637 (20a5328) into main (31797f6) will decrease coverage by 0.15%.
The diff coverage is 91.51%.

❗ Current head 20a5328 differs from pull request most recent head ecdb685. Consider uploading reports for the commit ecdb685 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1637      +/-   ##
==========================================
- Coverage   93.72%   93.57%   -0.16%     
==========================================
  Files          27       27              
  Lines       23326    23401      +75     
  Branches     1084     1084              
==========================================
+ Hits        21862    21897      +35     
- Misses       1430     1470      +40     
  Partials       34       34              
Flag Coverage Δ
c-tests 91.65% <93.72%> (-0.23%) ⬇️
lwt-tests 93.49% <100.00%> (ø)
python-c-tests 95.40% <88.31%> (-0.08%) ⬇️
python-tests 98.78% <92.85%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
c/tskit/core.h 100.00% <ø> (ø)
python/_tskitmodule.c 92.18% <86.06%> (-0.16%) ⬇️
c/tskit/tables.c 89.55% <92.63%> (-0.37%) ⬇️
python/tskit/tables.py 98.78% <92.85%> (-0.11%) ⬇️
c/tskit/core.c 97.21% <100.00%> (-0.03%) ⬇️
c/tskit/genotypes.c 93.91% <100.00%> (ø)
c/tskit/haplotype_matching.c 95.05% <100.00%> (ø)
c/tskit/trees.c 94.86% <100.00%> (ø)
python/lwt_interface/tskit_lwt_interface.h 95.41% <100.00%> (ø)
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 31797f6...ecdb685. Read the comment docs.

@gtsambos
Copy link
Member

Wow. Very cool! Let me know if there's anything in particular you'd like me to look at or test out here.

@jeromekelleher
Copy link
Member Author

Yeah, this is looking promising @gtsambos, I think we'll be able to make some significant gains with this approach. I'll ping you when it's ready for a look.

@jeromekelleher
Copy link
Member Author

This is ready for a look after #1618 goes in I think.

@jeromekelleher
Copy link
Member Author

This is ready for review and merge now I think. There's a bunch of things not answered yet about the interface (particularly how we deal with unknown sample pairs), but I think it's best to merge this much now as a good step in the right direction. We'll need to resolve #1640 and #1639 before we can really finalise the semantics of the find_ibd operations.

@benjeffery
Copy link
Member

@Mergifyio rebase

@mergify
Copy link
Contributor

mergify bot commented Sep 1, 2021

Command rebase: success

Branch already up to date

Copy link
Member

@benjeffery benjeffery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had a quick look over this, looks good. We prob need some issues written up for the few todos here.

@gtsambos
Copy link
Member

gtsambos commented Sep 2, 2021

Thanks @jeromekelleher -- today was a day off for me, I'll review this tomorrow

Copy link
Member

@gtsambos gtsambos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @jeromekelleher, thanks for doing this! Some of this is a bit beyond my C comprehension, but what I can understand looks good. The comments are minor, and are just about things I don't fully get.

@@ -3404,6 +3395,12 @@ tsk_id_t tsk_table_collection_check_integrity(

/* Undocumented methods */

/* TODO be systematic about where "result" should be in the params
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make an issue for this

for (j = 0; j < (int) ibd_finder.num_pairs; j++) {
ret = tsk_ibd_finder_get_ibd_segments(&ibd_finder, j, &seg);
for (j = 0; j < (int) result.num_pairs; j++) {
pair = samples + 2 * j;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused about what this pair is supposed to be?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's accessing the jth pair, which is at offset 2 * j since each pair takes up two slots in the array. This is the same thing as saying pair = &samples[2 * j], as arrays are just pointer arithmetic in C.

for (j = 0; j < (int) ibd_finder.num_pairs; j++) {
ret = tsk_ibd_finder_get_ibd_segments(&ibd_finder, j, &seg);
for (j = 0; j < (int) result.num_pairs; j++) {
pair = samples + 2 * j;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous comment

@jeromekelleher
Copy link
Member Author

Are we OK to merge this before the C release @benjeffery? I'd like to crack on and get the AVL tree code integrated ASAP so we can start finalising the semantics.

This is a transitional implementation as we figure out the right
interface for specifying how to work with large sets of IBD segments.
@mergify mergify bot merged commit daf964d into tskit-dev:main Sep 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants