-
Notifications
You must be signed in to change notification settings - Fork 231
fix(connections): disconnect when we encounter a non-retryable error code on an atlas connection COMPASS-9793 #7316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
// pass it down to telemetry and instance model. This is a relatively | ||
// expensive dataService operation so we're trying to keep the usage | ||
// very limited | ||
const instanceInfo = await dataService.instance(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either this should be before we store the dataService instance in a map, or we need to add explicit cleanup for it in the catch block below.
DataServiceForConnection.set(connectionInfo.id, dataService); | |
// We're trying to optimise the initial Compass loading times here: to | |
// make sure that the driver connection pool doesn't immediately get | |
// overwhelmed with requests, we fetch instance info only once and then | |
// pass it down to telemetry and instance model. This is a relatively | |
// expensive dataService operation so we're trying to keep the usage | |
// very limited | |
const instanceInfo = await dataService.instance(); | |
// We're trying to optimise the initial Compass loading times here: to | |
// make sure that the driver connection pool doesn't immediately get | |
// overwhelmed with requests, we fetch instance info only once and then | |
// pass it down to telemetry and instance model. This is a relatively | |
// expensive dataService operation so we're trying to keep the usage | |
// very limited | |
const instanceInfo = await dataService.instance(); | |
DataServiceForConnection.set(connectionInfo.id, dataService); |
const instanceInfo = await dataService.instance(); | ||
|
||
let showedNonRetryableErrorToast = false; | ||
// Listen for non-retry-able errors on failed server heartbeats. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we're planning to add an e2e test for this, but it probably wouldn't hurt to add a comment also
// Listen for non-retry-able errors on failed server heartbeats. | |
// NB: Order of operations is important here. Make sure that all events | |
// are attached BEFORE any other command is executed with dataService as | |
// connect method doesn't really guarantee that connection is fully | |
// established and these event listeners are important part of the | |
// connection flow. | |
// Listen for non-retry-able errors on failed server heartbeats. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you have a chance to try this out locally? I'm looking at the event listener logic and have doubts now that it's enough to just move the instance call below to make everything fully work: from how it looks right now I'm guessing you'd see two error toasts, one of them will have a very cryptic error message about "client not created" or something along those line, I don't think it's an expected behavior
I actually wonder now if this ever worked for initial connection properly, even before this change to instance fetching, dataService anyway wouldn't resolve in connect until driver already runs a bunch of operations and only then we attached the listeners, so for re-connect attempts this would work, but not for initial connection, maybe @Anemy knows better |
For example, connect with fail fast that compass uses otherwise had to be integrated inside the data explorer via shared devtools-connect logic to work during connect |
@gribnoysup This didn't ever work for initial connections. When writing the initial implementation I was under the impression we were adding this in order to stop retries to databases that were already connected to (that's what led to this coming back up). As in a user is connected and then in the background deletes or pauses their cluster, or their role/session changes. |
Yeah, this makes sense, but then I guess this ticket is not for a regression fix, but for a "feature request" 😄 |
COMPASS-9793
I'll add followup e2es as a separate PR but I don't want to block the fix on figuring those out