-
Notifications
You must be signed in to change notification settings - Fork 735
fix inclusion of PRO in secondary structure #5065
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
- fix #4913 - ported fix from PyDSSP 0.9.1 by @ShintaroMinami to analysis.dssp.DSSP (see also ShintaroMinami/PyDSSP#2) - new kwarg ignore_proline_donor=True for DSSP (the new default changes the behavior and implements the fix, False recovers old behavior); the kwarg also exists in PyDSSP - updated docs - minimal regression tests - updated CHANGELOG
b38e9db to
005100f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a quick draft. I'd be more than happy if someone continued and completed it.
(EDIT: ... and I am also grateful for any comments that would help me to continue working on it.)
| self._donor_mask: Optional[np.ndarray] = ( | ||
| ag.residues.resnames != "PRO" if ignore_proline_donor else None | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may not be correct. The code runs ... but I am not sure if I should be masking corresponding atoms.
| Mask out any hydrogens that should not be considered (in particular HN | ||
| in PRO). If ``None`` then all H will be used (behavior up to 2.9.0). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These docs should be more specific and state the shape. I just quickly guessed the shape from https://github.com/ShintaroMinami/PyDSSP/blob/e251a43ff8622fe0a555313b1567edce45e789e8/scripts/pydssp#L30
donor_mask = sequence != 'PRO' if args.ignore_proline_donor else None| if donor_mask is not None | ||
| else np.ones(n_atoms, dtype=float) | ||
| ) | ||
| donor_mask = np.tile(donor_mask[:, np.newaxis], (1, n_atoms)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the donor_mask (one element for each residue) really correct for this tiling????
| hbond_map = (np.sin(hbond_map / margin * np.pi / 2) + 1.0) / 2 | ||
| hbond_map = hbond_map * local_mask | ||
| hbond_map *= local_mask | ||
| hbond_map *= donor_mask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? The original code uses https://github.com/ShintaroMinami/PyDSSP/blob/e251a43ff8622fe0a555313b1567edce45e789e8/pydssp/pydssp_numpy.py#L72
hbond_map = hbond_map * repeat(donor_mask, 'l1 l2 -> b l1 l2', b=b)(with einops.repeat()). Note that we create our donor_mask with tile so it may already be the right size and shape.
| def test_file_guess_hydrogens(pdb_filename, client_DSSP): | ||
| # run 2.9.0 tests (which include PRO) | ||
| # ignore_proline_donor=False | ||
| # TODO: update reference data for ignore_proline_donor=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should really have correct reference data. About half of the files do not show a difference between ignore_proline_donor=True and ignore_proline_donor=False.
| protein = mda.Universe(TPR, XTC).select_atoms("protein") | ||
| run = DSSP(protein).run(**client_DSSP, stop=10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines seemed superfluous as the atomgroup approach is tested separately.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #5065 +/- ##
===========================================
+ Coverage 92.68% 92.73% +0.05%
===========================================
Files 180 180
Lines 22452 22456 +4
Branches 3186 3186
===========================================
+ Hits 20809 20824 +15
- Misses 1169 1176 +7
+ Partials 474 456 -18 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes the treatment of proline residues in secondary structure assignment by implementing the DSSP algorithm more correctly. The fix ports changes from PyDSSP 0.9.1 that properly handles proline residues by not considering their HN atoms as hydrogen bond donors.
- Adds
ignore_proline_donorparameter to DSSP class with defaultTruefor correct behavior - Updates the hydrogen bond calculation to mask proline donors when requested
- Maintains backward compatibility through the new parameter
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| package/MDAnalysis/analysis/dssp/dssp.py | Adds ignore_proline_donor parameter and creates donor mask for proline residues |
| package/MDAnalysis/analysis/dssp/pydssp_numpy.py | Updates hydrogen bond functions to accept and use donor mask for filtering |
| testsuite/MDAnalysisTests/analysis/test_dssp.py | Adds regression tests and updates existing tests for the new proline handling |
| package/CHANGELOG | Documents the fix and new parameter |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| assert ( | ||
| first_frame[:10] != last_frame[:10] == avg_frame[:10] == "-EEEEEE---" | ||
| ) |
Copilot
AI
Sep 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The removal of these duplicate lines that created a new protein AtomGroup and ran DSSP again appears to be cleanup of unused code, but the test logic should be verified to ensure the test still properly validates the trajectory functionality.
| assert ( | |
| first_frame[:10] != last_frame[:10] == avg_frame[:10] == "-EEEEEE---" | |
| ) | |
| assert first_frame[:10] != last_frame[:10] | |
| assert last_frame[:10] == avg_frame[:10] == "-EEEEEE---" |
|
|
||
| def _get_hydrogen_atom_position(coord: np.ndarray) -> np.ndarray: | ||
| """Fills in hydrogen atoms positions if they are abscent, under the | ||
| """Fills in hydrogen atoms positions if they are absent, under the |
Copilot
AI
Sep 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed spelling error: 'abscent' should be 'absent'.
|
@marinegor if you can spare a moment to look at this PR then that would be great; I think you're really the person with best insight into the DSSP code. Feel free to tell me that all these changes are rubbish and we need to redo everything. |
|
@orbeckst I don't think it's all rubbish -- when this change was introduced to pydssp, I also made a similar attempt of introducing it to MDAnalysis. My main concern is how to combine it with a workaround I introduced when PRO fix wasn't yet available. I have a feeling that in principle we should just get rid of that (I mean roughly this part) and rely purely on pydssp. I'll have a closer look. |
Fixes #4913
Changes made in this Pull Request:
PR Checklist
package/CHANGELOGfile updated?package/AUTHORS? (If it is not, add it!)Developers Certificate of Origin
I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.
📚 Documentation preview 📚: https://mdanalysis--5065.org.readthedocs.build/en/5065/