-
Notifications
You must be signed in to change notification settings - Fork 246
DRIVERS-3282: Introduce optin setting to await for MinPoolSize population #1834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -2,7 +2,7 @@ | |||
|
|||
description: "timeoutMS can be configured on a MongoClient" | |||
|
|||
schemaVersion: "1.9" | |||
schemaVersion: "1.26" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prestonvasquez I've updated this file only with the new approach to pre-heat the pool. Should we apply similar settings to more tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest updating the runCursorCommand "Non-tailable cursor lifetime remaining timeoutMS applied to getMore if timeoutMode is unset" test, which would close DRIVERS-3134 (PR https://github.com/mongodb/specifications/pull/1769/files).
Otherwise, I will collect a list of candidates during Go Driver's second implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update the test and list of candidates would be much appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add this functionality to any CSOT test that (1) doesn't require threads,(2) relies on a failpoint, and (3) uses timeoutMS
either as an operation argument or on the client. The following seem relevant:
- retryability-timeoutMS.yml
- command-execution.jyml
- non-tailable-cursors.yml
- error-transformations.yml
- convenient-transactions.yml
- sessions-inherit-timeoutMS.yml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -2,7 +2,7 @@ | |||
|
|||
description: "timeoutMS can be configured on a MongoClient" | |||
|
|||
schemaVersion: "1.9" | |||
schemaVersion: "1.26" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest updating the runCursorCommand "Non-tailable cursor lifetime remaining timeoutMS applied to getMore if timeoutMode is unset" test, which would close DRIVERS-3134 (PR https://github.com/mongodb/specifications/pull/1769/files).
Otherwise, I will collect a list of candidates during Go Driver's second implementation.
- `awaitMinPoolSize`: Optional boolean. If `true`, the unified spec runner must wait for the connection pool to be | ||
populated for all servers according to the `minPoolSize` option. If `false`, not specified, or if minPoolSize | ||
equals 0, there is no need to wait for any specific pool state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drivers that asynchronously construct the pool size with minPoolSize after client construction could make this check block indefinitely. We should prescribe an appropriate amount of time to await warming the connection, perhaps 500ms? From experimentation, actual connection time seems to be around 50-200ms on local networks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's put limit to 1 second, we can tune it in future if needed. In case if this is not enough, should we throw and report test as failed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me, thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a downside to setting a higher limit? Worst case, the tests takes a bit longer to fail. Best case, we avoid some flakiness and modifications of these tests if 1s proves to be too short.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could give a recommended timeoutMS based on the 20-500ms observation for 8.1 and let drivers update as needed. Does that seem reasonable? The important distinction is that we don't block indefinitely listening to the event monitor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel strongly but I do think that if we're debating "is the timeout high enough" and there aren't downsides of setting it higher, we should just set it higher now to avoid test updates and test syncs in the future if the timeout value doesn't end up being high enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make it resilient we might want to use formula like: waitLimit = minPoolSize * 10s
. This way we will automatically extent the limit if there should be more connection to establish.
@prestonvasquez @baileympearson WDYT? Should I add this to spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@baileympearson I've updated text to wait for minPoolSize * 10s
. I'm positive if connection could not be established during 10seconds - then we really have a problem, and there is no sense to wait any more. Does it sounds as reliable solution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me. 10s seems plenty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could specify the value in the spec test. For example, instead of an awaitMinPoolSize
boolean, we could add an awaitMinPoolSizeMS
integer.
E.g. to wait for 10 seconds:
awaitMinPoolSizeMS: 10000
@@ -2,7 +2,7 @@ | |||
|
|||
description: "timeoutMS can be configured on a MongoClient" | |||
|
|||
schemaVersion: "1.9" | |||
schemaVersion: "1.26" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add this functionality to any CSOT test that (1) doesn't require threads,(2) relies on a failpoint, and (3) uses timeoutMS
either as an operation argument or on the client. The following seem relevant:
- retryability-timeoutMS.yml
- command-execution.jyml
- non-tailable-cursors.yml
- error-transformations.yml
- convenient-transactions.yml
- sessions-inherit-timeoutMS.yml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! 👍
connections. Only connection pools for data-bearing nodes should be considered. If any connection pool is not | ||
populated within the specified timeframe, the Unified Spec Runner must raise an error and mark the test as failed. | ||
If the parameter is omitted or if `minPoolSize` is set to 0, no waiting is required for a specific pool state | ||
prior to test execution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to mention the behavior of CMAP and SDAM events/logs that occur before the pool is populated? Is the intention that those events will show up or will they be ignored?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignoring of any events mentioned in the changes.
Thanks
981fbec
to
f337745
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two small suggestions on the spec file. I've not reviewed the test files as I'd be unable to POC those anyway.
<span id="entity_client_awaitMinPoolSizeMS"></span> | ||
|
||
- `awaitMinPoolSizeMS`: Optional, integer. When specified, this parameter defines the maximum duration (in | ||
milliseconds) that the Unified Spec Runner must wait for each connection pool to be populated with `minPoolSize` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't recall ever using this term as a proper noun. I think you can just write "test runner" for consistency with other text.
Likewise on the line below.
connections. Only connection pools for data-bearing nodes should be considered. If any connection pool is not | ||
populated within the specified timeframe, the Unified Spec Runner must raise an error and mark the test as failed. | ||
Any CMAP and SDAM events/logs listeners configured on the client should ignore any events that occur before the | ||
pool is being populated. If the parameter is omitted or if `minPoolSize` is set to 0, no waiting is required for a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can clarify that you're referring to the minPoolSize
URI option; lest readers think it's just another client entity option.
4c87976
to
b6a1b62
Compare
Some CSOT tests are require pre-heat connection pool to have a predictable test outcome. The suggestion is to use
minPoolSize
together with new settingawaitMinPoolSize
that makes unified test runner to await until pool will be populated before run the actual test case.Please complete the following before merging:
clusters).