Skip to content

Conversation

@serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Dec 30, 2021

@serhiy-storchaka serhiy-storchaka marked this pull request as ready for review December 31, 2021 10:09
@mdickinson
Copy link
Member

Code changes look good to me (modulo the compiler warnings on Windows). I'm running some timings.

@mdickinson
Copy link
Member

mdickinson commented Dec 31, 2021

I'm running some timings.

Running the same timings scripts that are posted to the issue, I see roughly a 5% slowdown for small comb(n, k) computations (0 <= k <= n <= 67) on this branch, compared with master. But that's compensated for by the wider applicability of the fast path, so I'm fine with that. (EDIT: Fixed the upper limit for n.)

Copy link
Member

@mdickinson mdickinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tim-one
Copy link
Member

tim-one commented Dec 31, 2021

Running the same timings scripts that are posted to the issue, I see roughly a 5% slowdown for small comb(n, k) computations (0 <= k <= n <= 67)

That's really hard to swallow, Mark. At. e.g., comb(50, 15), the pre-existing fast path did 14 64-bit multiplies and - more significantly - also 14 64-bit divides. How on Earth could that be faster than doing instead 2 multiplies and a shift, pretty much no matter how slow popcount is? A similar argument applies for almost all cases taken over by the newer fast path, the more astonishing to see a slowdown the larger k. For k == 1 the new fast path is clearly slower, and k == 2 is unclear without timing, but the new path seems a clear win for k > 2.

Related: it may or may not be a timing win to set fast_comb_limits1[1] to 0, so k==1 takes the second fast path, which just returns n without any arithmetic. Possibly also for fast_comb_limits1[2], depending on how slow division by 2 is.

@mdickinson
Copy link
Member

That's really hard to swallow, Mark.

I'm comparing this branch with the main branch. (Sorry, I said "master" above, but I meant "main".) The main branch already has the fast mod-2**64 + popcount code in it. This branch has the same, but with some extra up-front indirection, so it's a bit slower.

@tim-one
Copy link
Member

tim-one commented Dec 31, 2021

I'm comparing this branch with the main branch.

Ah, that explains it - sorry for the noise! 😃

Comment on lines 3391 to 3394
/* Maps k to the maximal n so that 2*k-1 <= n <= 127 and C(n, k)*k
* fits into a long long (which is at least 64 bit). Only contains
* items larger than in fast_comb_limits1. */
static const unsigned long long fast_comb_limits2[] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which is at least 64 bit

C99 provides types like uint_least64_t that express such comments explicitly. So I think it will be better to #include <stdint.h> and use it:

Suggested change
/* Maps k to the maximal n so that 2*k-1 <= n <= 127 and C(n, k)*k
* fits into a long long (which is at least 64 bit). Only contains
* items larger than in fast_comb_limits1. */
static const unsigned long long fast_comb_limits2[] = {
/* Maps k to the maximal n so that 2*k-1 <= n <= 127 and C(n, k)*k
* fits into a long long. Only contains
* items larger than in fast_comb_limits1. */
static const uint_least64_t fast_comb_limits2[] = {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should contain at least LLONG_MAX. There is no guarantee that uint_least64_t is at least so large as positive long long.

Copy link
Member

@mdickinson mdickinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still LGTM

@serhiy-storchaka serhiy-storchaka merged commit 2d78797 into python:main Jan 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Performance or resource usage skip news

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants