Skip to content

Conversation

Ofekirsh
Copy link

Summary

This PR delivers four changes:

  1. Correct GVH/HVG and mismatch computation.
  2. Fix a missing-donor bug in the candidate list.
  3. Generalize to k loci (no longer limited to 5).
  4. Support additional allele formats (beyond xy:wz), while keeping the genotype representation compact.

What changed

(1) GVH/HVG and mismatches

  • For each locus, patient/donor alleles are handled as sets (no duplicates).

  • We now compute sizes (cardinalities):

    • GVH := | Patient \ Donor |
    • HVG := | Donor \ Patient |
  • The number of mismatches is max(GVH, HVG).

File: grma/match/donors_matching.py.


(2) Missing donor in candidates

  • In cpdef tuple neighbors_2nd(self, UINT node), a duplicate -1 placeholder was present; removing the extra one prevents dropping a donor from the results.

File: grma/match/lol_graph.pyx.


(3) k-loci generalization

  • Removed hard-coded assumptions (e.g., the magic number 10) and replaced them with configurable values.
  • Verified for 4/5 loci; full 6-loci coverage wasn’t targeted here due to edge cases (e.g., DRB3/4/5, DRBX last locus, etc.).

(4) Additional allele formats + compact storage

  • Broadened parsing/handling beyond xy:wz (varying digit widths/fields).
  • To avoid donor-tree bloat, we use UIDs per allele via a bidirectional map (bidict).
  • Classes/subclasses use hash keys for compact indices; any hash collisions are harmless since exact mismatch checks operate on the UID layer.

Affected: many files (tree maintenance, matching, LOL/SLUG building, utilities).
New dependency: bidict.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant