Skip to content

Conversation

@jasontedor
Copy link
Member

Today Elasticsearch limits the number of processors used in computing
thread counts to 32. This was from a time when Elasticsearch created
more threads than it does now and users would run into out of memory
errors. It appears the real cause of these out of memory errors was not
well understood (it's often due to ulimit settings) and so users were
left hitting these out of memory errors on boxes with high core
counts. Today Elasticsearch creates less threads (but still a lot) and
we have a bootstrap check in place to ensure that the relevant ulimit is
not too low.

There are some caveats still to having too many concurrent indexing
threads as it can lead to too many little segments, and it's not a
magical go faster knob if indexing is already bottlenecked by disk, but
this limitation is artificial and surprising to users and so it should
be removed.

Closes #20828

Today Elasticsearch limits the number of processors used in computing
thread counts to 32. This was from a time when Elasticsearch created
more threads than it does now and users would run into out of memory
errors. It appears the real cause of these out of memory errors was not
well understood (it's often due to ulimit settings) and so users were
left hitting these out of memory errors on boxes with high core
counts. Today Elasticsearch creates less threads (but still a lot) and
we have a bootstrap check in place to ensure that the relevant ulimit is
not too low.

There are some caveats still to having too many concurrent indexing
threads as it can lead to too many little segments, and it's not a
magical go faster knob if indexing is already bottlenecked by disk, but
this limitation is artificial and surprising to users and so it should
be removed.
This commit increases the lower bound of the max processes ulimit, to
prepare for a world where Elasticsearch instances might be running with
more the previous cap of 32 processors. With the current settings,
Elasticsearch wants to create roughly 576 + 25 * p / 2 threads, where p
is the number of processors. Add in roughly 7 * p / 8 threads for the GC
threads and a fudge factor, and 4096 should cover us pretty well up to
256 cores.
@jasontedor
Copy link
Member Author

I've targeted this for 5.1.0, but I think that da92de7 should be backported to 5.0.0 or users will have a rude awakening when they upgrade to 5.1.0 if they've already adjusted their ulimit to 2048.

@jasontedor
Copy link
Member Author

@mikemccand Are you interested in reviewing?

@bleskes bleskes changed the title Remove artificial processors limit Remove artificial default processors limit Oct 12, 2016
@bleskes
Copy link
Contributor

bleskes commented Oct 12, 2016

Tracing through history, it seems the original issue was about the transport client. Maybe it makes sense to set a bounded default in the client alone?

@mikemccand
Copy link
Contributor

LGTM, thanks @jasontedor!

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jpountz
Copy link
Contributor

jpountz commented Oct 12, 2016

I've targeted this for 5.1.0, but I think that da92de7 should be backported to 5.0.0 or users will have a rude awakening when they upgrade to 5.1.0 if they've already adjusted their ulimit to 2048.

How bad would it be to only push this commit to 6.0?

@jasontedor
Copy link
Member Author

How bad would it be to only push this commit to 6.0?

It just means another year of living with this limitation.

@jpountz
Copy link
Contributor

jpountz commented Oct 12, 2016

I'm not sure I was clear, but I was commenting specifically about the change to the BootstrapCheck. Said otherwise: how about removing the 32 limit in 5.1 and requiring an increase to the ulimit only in 6.0? We could potentially revisit the idea to better share thread pools in 5.1 to make the number of threads less of an issue.

@bleskes
Copy link
Contributor

bleskes commented Oct 12, 2016

I think that is this stage (rc is out), we need to ask a different question - Does it really need to go into 5.0?. I think if we’re honest, the answer is no?

On 12 Oct 2016, at 23:48, Jason Tedor [email protected] wrote:

How bad would it be to only push this commit to 6.0?

It just means another year of living with this limitation.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

@jasontedor
Copy link
Member Author

jasontedor commented Oct 12, 2016

I'm not sure I was clear, but I was commenting specifically about the change to the BootstrapCheck.

@jpountz It definitely was not clear; I spoke with @s1monw via another channel and I think that he interpreted the same way that I was interpreting it (to keep the limitation until 6.0.0).

Said otherwise: how about removing the 32 limit in 5.1 and requiring an increase to the ulimit only in 6.0?

It's worth considering.

@jasontedor
Copy link
Member Author

jasontedor commented Oct 12, 2016

@bleskes It's not clear what you're commenting on; the processor limitation removal is not targeted for 5.0.0, just the bootstrap check change as otherwise users might have to set the ulimit twice: once on upgrade to 5.0.0, and again on upgrade to 5.1.0.

@s1monw
Copy link
Contributor

s1monw commented Oct 13, 2016

What about this, we wait with boostrap checks until 6.0 and make the hardlimit a softlimit in 5.1 then users that have such a high CPU count can set it higher but it's bound by default. We can also to support users print a warning message that they should consider raising this limit. I am not a fan of changing something like this in a minor release

@bleskes
Copy link
Contributor

bleskes commented Oct 13, 2016

@bleskes It's not clear what you're commenting on; the processor limitation removal is not targeted for 5.0.0,

I see. Then it's good in that respect. I'm still hesitant about changing the ulimit as well - I'm not sure bootstrap checks should make everyone account for the largest use case possible. One other idea I had is to take Runtime.getRuntime().availableProcessors() into account when determining the lower bound limit. That said - we haven't released this yet so I tend to prefer having the least restrictive and simplest option first and see how it goes.

@jasontedor
Copy link
Member Author

I'm not sure bootstrap checks should make everyone account for the largest use case possible. One other idea I had is to take Runtime.getRuntime().availableProcessors() into account when determining the lower bound limit. That said - we haven't released this yet so I tend to prefer having the least restrictive and simplest option first and see how it goes.

@bleskes Before opening this PR, I did toy with exactly this idea. It's fine, it works, but I opted exactly for the simplest approach because the default on most systems is already quite large (total number of pages / 64); any x86 system with more than 1 GB of memory will have its defaults above the limit enforced here.

@bleskes
Copy link
Contributor

bleskes commented Oct 13, 2016

the default on most systems is already quite large (total number of pages / 64); any x86 system with more than 1 GB of memory will have its defaults above the limit enforced here.

All good then!

@jasontedor jasontedor removed the v5.1.1 label Oct 14, 2016
@jasontedor
Copy link
Member Author

What about this, we wait with boostrap checks until 6.0 and make the hardlimit a softlimit in 5.1 then users that have such a high CPU count can set it higher but it's bound by default.

It's a soft limit today.

We discussed this during Fix-it-Friday and agreed to push this to 6.0.0 only since there is a workaround for the limit in the 5.x series already (see #20895).

@jasontedor jasontedor merged commit 595ec8c into elastic:master Oct 14, 2016
@jasontedor jasontedor deleted the remove-artificial-processors-limit branch October 14, 2016 09:47
jasontedor added a commit that referenced this pull request Jul 9, 2017
This commit adjusts the expectation for the max number of threads in the
scaling thread pool configuration test. The reason that this expectation
is incorrect is because we removed the limitation that the number of
processors maxes out at 32, instead letting it be the true number of
logical processors on the machine. However, when we removed this
limitation, this test was never adjusted to reflect the new reality yet
it never arose since our tests were not running on machines with
incredibly high core counts.

Relates #20874
jasontedor added a commit that referenced this pull request Jul 9, 2017
This commit adjusts the expectation for the max number of threads in the
scaling thread pool configuration test. The reason that this expectation
is incorrect is because we removed the limitation that the number of
processors maxes out at 32, instead letting it be the true number of
logical processors on the machine. However, when we removed this
limitation, this test was never adjusted to reflect the new reality yet
it never arose since our tests were not running on machines with
incredibly high core counts.

Relates #20874
jasontedor added a commit that referenced this pull request Jul 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Core/Infra/Core Core issues without another label >enhancement v6.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants