Skip to content

Conversation

@orbeckst
Copy link
Member

@orbeckst orbeckst commented Jun 12, 2025

Fixes #4913

Changes made in this Pull Request:

  • ported fix from PyDSSP 0.9.1 by @ShintaroMinami to analysis.dssp.DSSP (see also Wrong assignment or prolines? ShintaroMinami/PyDSSP#2)
  • new kwarg ignore_proline_donor=True for DSSP (the new default changes the behavior and implements the fix, False recovers old behavior); the kwarg also exists in PyDSSP
  • updated docs
  • minimal regression tests
  • updated CHANGELOG

PR Checklist

  • Issue raised/referenced?
  • Tests updated/added?
  • Documentation updated/added?
  • package/CHANGELOG file updated?
  • Is your name in package/AUTHORS? (If it is not, add it!)

Developers Certificate of Origin

I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.


📚 Documentation preview 📚: https://mdanalysis--5065.org.readthedocs.build/en/5065/

- fix #4913
- ported fix from PyDSSP 0.9.1 by @ShintaroMinami to analysis.dssp.DSSP
  (see also ShintaroMinami/PyDSSP#2)
- new kwarg ignore_proline_donor=True for DSSP (the new default changes the behavior
  and implements the fix, False recovers old behavior); the kwarg also exists in
  PyDSSP
- updated docs
- minimal regression tests
- updated CHANGELOG
@orbeckst orbeckst force-pushed the update-dssp-proline-fix branch from b38e9db to 005100f Compare June 12, 2025 00:23
Copy link
Member Author

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a quick draft. I'd be more than happy if someone continued and completed it.

(EDIT: ... and I am also grateful for any comments that would help me to continue working on it.)

Comment on lines +332 to +334
self._donor_mask: Optional[np.ndarray] = (
ag.residues.resnames != "PRO" if ignore_proline_donor else None
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not be correct. The code runs ... but I am not sure if I should be masking corresponding atoms.

Comment on lines +135 to +136
Mask out any hydrogens that should not be considered (in particular HN
in PRO). If ``None`` then all H will be used (behavior up to 2.9.0).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These docs should be more specific and state the shape. I just quickly guessed the shape from https://github.com/ShintaroMinami/PyDSSP/blob/e251a43ff8622fe0a555313b1567edce45e789e8/scripts/pydssp#L30

donor_mask = sequence != 'PRO' if args.ignore_proline_donor else None

if donor_mask is not None
else np.ones(n_atoms, dtype=float)
)
donor_mask = np.tile(donor_mask[:, np.newaxis], (1, n_atoms))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the donor_mask (one element for each residue) really correct for this tiling????

hbond_map = (np.sin(hbond_map / margin * np.pi / 2) + 1.0) / 2
hbond_map = hbond_map * local_mask
hbond_map *= local_mask
hbond_map *= donor_mask
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? The original code uses https://github.com/ShintaroMinami/PyDSSP/blob/e251a43ff8622fe0a555313b1567edce45e789e8/pydssp/pydssp_numpy.py#L72

hbond_map = hbond_map * repeat(donor_mask, 'l1 l2 -> b l1 l2', b=b)

(with einops.repeat()). Note that we create our donor_mask with tile so it may already be the right size and shape.

def test_file_guess_hydrogens(pdb_filename, client_DSSP):
# run 2.9.0 tests (which include PRO)
# ignore_proline_donor=False
# TODO: update reference data for ignore_proline_donor=True
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should really have correct reference data. About half of the files do not show a difference between ignore_proline_donor=True and ignore_proline_donor=False.

Comment on lines -35 to -36
protein = mda.Universe(TPR, XTC).select_atoms("protein")
run = DSSP(protein).run(**client_DSSP, stop=10)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines seemed superfluous as the atomgroup approach is tested separately.

@codecov
Copy link

codecov bot commented Jun 12, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.73%. Comparing base (5c7c480) to head (e9df025).

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #5065      +/-   ##
===========================================
+ Coverage    92.68%   92.73%   +0.05%     
===========================================
  Files          180      180              
  Lines        22452    22456       +4     
  Branches      3186     3186              
===========================================
+ Hits         20809    20824      +15     
- Misses        1169     1176       +7     
+ Partials       474      456      -18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@orbeckst orbeckst requested review from RMeli and Copilot September 4, 2025 00:49
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes the treatment of proline residues in secondary structure assignment by implementing the DSSP algorithm more correctly. The fix ports changes from PyDSSP 0.9.1 that properly handles proline residues by not considering their HN atoms as hydrogen bond donors.

  • Adds ignore_proline_donor parameter to DSSP class with default True for correct behavior
  • Updates the hydrogen bond calculation to mask proline donors when requested
  • Maintains backward compatibility through the new parameter

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
package/MDAnalysis/analysis/dssp/dssp.py Adds ignore_proline_donor parameter and creates donor mask for proline residues
package/MDAnalysis/analysis/dssp/pydssp_numpy.py Updates hydrogen bond functions to accept and use donor mask for filtering
testsuite/MDAnalysisTests/analysis/test_dssp.py Adds regression tests and updates existing tests for the new proline handling
package/CHANGELOG Documents the fix and new parameter

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines 69 to 71
assert (
first_frame[:10] != last_frame[:10] == avg_frame[:10] == "-EEEEEE---"
)
Copy link

Copilot AI Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal of these duplicate lines that created a new protein AtomGroup and ran DSSP again appears to be cleanup of unused code, but the test logic should be verified to ensure the test still properly validates the trajectory functionality.

Suggested change
assert (
first_frame[:10] != last_frame[:10] == avg_frame[:10] == "-EEEEEE---"
)
assert first_frame[:10] != last_frame[:10]
assert last_frame[:10] == avg_frame[:10] == "-EEEEEE---"

Copilot uses AI. Check for mistakes.

def _get_hydrogen_atom_position(coord: np.ndarray) -> np.ndarray:
"""Fills in hydrogen atoms positions if they are abscent, under the
"""Fills in hydrogen atoms positions if they are absent, under the
Copy link

Copilot AI Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed spelling error: 'abscent' should be 'absent'.

Copilot uses AI. Check for mistakes.
@orbeckst orbeckst requested a review from marinegor October 30, 2025 23:02
@orbeckst
Copy link
Member Author

@marinegor if you can spare a moment to look at this PR then that would be great; I think you're really the person with best insight into the DSSP code. Feel free to tell me that all these changes are rubbish and we need to redo everything.

@marinegor
Copy link
Contributor

@orbeckst I don't think it's all rubbish -- when this change was introduced to pydssp, I also made a similar attempt of introducing it to MDAnalysis.

My main concern is how to combine it with a workaround I introduced when PRO fix wasn't yet available. I have a feeling that in principle we should just get rid of that (I mean roughly this part) and rely purely on pydssp. I'll have a closer look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update secondary structure assignment in DSSP after an upstream fix

3 participants