-
Notifications
You must be signed in to change notification settings - Fork 246
DRIVERS-3218 Avoid clearing the connection pool when the server connection rate limiter triggers #1855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…ction rate limiter triggers
| data: | ||
| failCommands: ["isMaster","hello"] | ||
| closeConnection: true | ||
| blockConnection: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we are discouraged from changing existing tests, these test still pass without the backpressure changes, and this avoids adding a new runOnRequirement.
| - **Backpressure-enabled** - The pool MUST add the error labels `SystemOverloadedError` and `RetryableError` to network | ||
| errors or network timeouts it encounters during the connection establishment or the `hello` message. These labels | ||
| are used by the | ||
| [server monitor](../server-discovery-and-monitoring/server-discovery-and-monitoring.md#error-handling-pseudocode) to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest [server monitor] -> [SDAM error handling]
"server monitor" is the title of another document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
|
||
| 1. Two concurrent writes begin on application threads A and B. | ||
| 2. The server restarts. | ||
| 3. Thread A receives the first non-timeout network error, and the client marks the server Unknown, and clears the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this section need to change? We aren't changing the behavior of network errors when executing a command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted
source/server-discovery-and-monitoring/tests/unified/backpressure-network-error-fail.yml
Show resolved
Hide resolved
baileympearson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed some test issues with Steve on slack, here's some other comments I had. My POC in Node is passing locally with the changes Steve is planning to make; once I have updated tests I'll test my POC in CI and hopefully everything passes.
source/server-discovery-and-monitoring/server-discovery-and-monitoring.md
Outdated
Show resolved
Hide resolved
source/server-discovery-and-monitoring/tests/unified/backpressure-network-error-fail.yml
Outdated
Show resolved
Hide resolved
source/server-discovery-and-monitoring/tests/unified/backpressure-network-error-fail.yml
Show resolved
Hide resolved
source/server-discovery-and-monitoring/server-discovery-and-monitoring-tests.md
Outdated
Show resolved
Hide resolved
source/server-discovery-and-monitoring/server-discovery-and-monitoring.md
Show resolved
Hide resolved
source/server-discovery-and-monitoring/server-discovery-and-monitoring-tests.md
Outdated
Show resolved
Hide resolved
source/server-discovery-and-monitoring/server-discovery-and-monitoring-tests.md
Outdated
Show resolved
Hide resolved
…nitoring.md Co-authored-by: Bailey Pearson <[email protected]>
|
Note: I haven't yet updated the prose test implementation in pymongo. I'm working on that next. |
baileympearson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor comment.
Second implementation in Node is passing: https://spruce.mongodb.com/version/691f7e221be6180007ecf081/tasks?sorts=STATUS%3AASC%3BBASE_STATUS%3ADESC
| of [Connections](#connection) (including available and in use) MUST NOT exceed **maxPoolSize** | ||
| - **Rate-limited:** A Pool MUST limit the number of [Connections](#connection) being | ||
| [established](#establishing-a-connection-internal-implementation) concurrently via the **maxConnecting** | ||
| [pool option](#connection-pool-options). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a rough pseudocode in the #### Establishing a Connection (Internal Implementation) section that we might consider updating. Do you think it's worth updating this to include attaching error labels?
| to avoid clearing the pool. The pool MUST NOT add the backpressure error labels during an authentication step | ||
| after the `hello` message. For errors that the driver can distinguish as never occurring due to server overload, | ||
| such as DNS lookup failures, TLS related errors, or errors encountered establishing a connection to a socks5 proxy, | ||
| the driver MUST NOT clear the connection pool and MUST NOT mark the server Unknown. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we want to continue clearing the pool and marking the server unknown for definite non-overload errors, shouldn't this say "MUST" rather than "MUST NOT"?
Replaces #1852
Please complete the following before merging:
clusters).