Skip to content

Conversation

@araujoms
Copy link
Collaborator

I've added an implementation of syrk/herk for generic types, in order to avoid falling back to _generic_matmatmul!, as it's rather slow. I didn't do anything fancy, no multithreading or anything, but this gives a 1.5x to 2x speedup for Int and BigFloat, for example.

The generic version of syrk only works for real and complex numbers, but funnily enough herk works for anything that respects conj(a*b) == conj(b)*conj(a), which as far as I can tell is any subtype of Number, including quaternions and octonions.

I've ran the tests locally by reverting to the commit before the lazy JLLs one, and they pass.

Copy link
Member

@dkarrasch dkarrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, better performance is always welcome. I've got some comments to explore the opportunity to make it even more generic.

Copy link
Member

@dkarrasch dkarrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to a small suggestion for the test, this LGTM.

Copy link
Member

@dkarrasch dkarrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should have a Quaternion test case?

@araujoms
Copy link
Collaborator Author

Ok, I've added a Quaternion test. I also added a pseudo docstring since the non-commutativity is not obvious. I assume a pseudo docstring is more appropriate than a real one since this function is not going to be exported.

@jishnub
Copy link
Member

jishnub commented Mar 31, 2025

Btw there's no reason to not have docstrings anymore for internal functions, now that we have public.

@araujoms
Copy link
Collaborator Author

Ok, added a docstring then.

@araujoms araujoms changed the title add generic sykr/herk add generic syrk/herk Mar 31, 2025
@codecov
Copy link

codecov bot commented Apr 1, 2025

Codecov Report

Attention: Patch coverage is 98.61111% with 1 line in your changes missing coverage. Please review.

Project coverage is 92.02%. Comparing base (e53b50c) to head (ada8598).
Report is 42 commits behind head on master.

Files with missing lines Patch % Lines
src/matmul.jl 98.57% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1249      +/-   ##
==========================================
- Coverage   92.03%   92.02%   -0.01%     
==========================================
  Files          34       34              
  Lines       15459    15528      +69     
==========================================
+ Hits        14227    14289      +62     
- Misses       1232     1239       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@dkarrasch dkarrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice! I only wonder if we could "hang the generic code lower" in the dispatch hierarchy, follow the existing dispatch for longer and thereby avoid some code duplication. But this could be explored in another PR once this is in.

@araujoms
Copy link
Collaborator Author

araujoms commented Apr 2, 2025

It can be done. I'd remove the restriction T<:BlasFloat from generic_matmatmul_wrapper!, and add methods to syrk_wrapper!, herk_wrapper!, and gemm_wrapper! for generic types.

I'd also like to remove the line matmul2x2or3x3_nonzeroalpha!(C, tA, tB, A, B, α, β) && return C from generic_matmatmul_wrapper!, because that breaks the feature of syrk/herk that the result is always symmetric/Hermitian.

@dkarrasch
Copy link
Member

So what do you think, should we merge and then you pick it up as time permits, or do you feel like you wanna work on it now?

@araujoms
Copy link
Collaborator Author

araujoms commented Apr 2, 2025

I'd rather do it now.

@jishnub
Copy link
Member

jishnub commented Apr 2, 2025

I'd also like to remove the line matmul2x2or3x3_nonzeroalpha!(C, tA, tB, A, B, α, β) && return C from generic_matmatmul_wrapper!

This line dispatches to the optimized methods for 2x2 and 3x3 matrices. Removing this would lead to dispatching to BLAS calls, which are suboptimal for small matrices.

@dkarrasch
Copy link
Member

It can be done. I'd remove the restriction T<:BlasFloat from generic_matmatmul_wrapper!, and add methods to syrk_wrapper!, herk_wrapper!, and gemm_wrapper! for generic types.

That's what I had in mind.

This line dispatches to the optimized methods for 2x2 and 3x3 matrices. Removing this would lead to dispatching to BLAS calls, which are suboptimal for small matrices.

If we move the generic code further down the call path, this might be handled by methods up in the call path?

@araujoms
Copy link
Collaborator Author

araujoms commented Apr 2, 2025

What about calling my generic code instead of BLAS for small matrices?

@araujoms
Copy link
Collaborator Author

araujoms commented Apr 2, 2025

Done. Moved matmul2x2or3x3_nonzeroalpha! down for gemm_wrapper!, and replaced it with my generic code for syrk_wrapper! and herk_wrapper!.

araujoms and others added 2 commits April 14, 2025 20:12
@jishnub
Copy link
Member

jishnub commented Apr 21, 2025

Looks fine to me, so let's merge barring any final comments

@araujoms
Copy link
Collaborator Author

Thank you and @dkarrasch for all the feedback. Any chance of backporting this into 1.12? It could be considered as fixing a performance bug.

@jishnub
Copy link
Member

jishnub commented Apr 21, 2025

Performance enhancements are typically not backported.

@jishnub jishnub merged commit a32a281 into JuliaLang:master Apr 22, 2025
3 of 4 checks passed
@dkarrasch
Copy link
Member

This has been open and under review for quite some time. When did we branch off? Could this be viewed as a late contribution to v1.12? @ViralBShah

@jishnub
Copy link
Member

jishnub commented Apr 22, 2025

cc @KristofferC

@jishnub
Copy link
Member

jishnub commented Apr 22, 2025

Given that #1297 is a bugfix to this PR, I would suggest that we don't backport this. Adding features often introduces bugs inadvertently that may then delay releases.

@araujoms
Copy link
Collaborator Author

That's very embarrassing. I realized when writing that PR that it was a strong argument against backporting, but I couldn't leave the bug unfixed.

For what it's worth I found the bug when doing a final review of the code to make sure it was safe to backport. I'm sure everything's fine now.

I'd be glad if at least #1247 could be backported, though, that is a straightforward bugfix.

@jishnub
Copy link
Member

jishnub commented Apr 22, 2025

I've marked it for backport.

jishnub added a commit that referenced this pull request Apr 22, 2025
Fixes a bug that I introduced in #1249. Sorry about that.

---------

Co-authored-by: Jishnu Bhattacharya <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants