Skip to content

Conversation

johnjackweir
Copy link
Collaborator

@johnjackweir johnjackweir commented Sep 11, 2025

COMPASS-9793

I'll add followup e2es as a separate PR but I don't want to block the fix on figuring those out

@johnjackweir johnjackweir requested a review from a team as a code owner September 11, 2025 20:23
@johnjackweir johnjackweir requested a review from Anemy September 11, 2025 20:23
@johnjackweir johnjackweir changed the title COMPASS-9793: Fetch connection info after adding non-retryable error listener bug(connections): Fetch connection info after adding non-retryable error listener COMPASS-9793 Sep 11, 2025
@johnjackweir johnjackweir changed the title bug(connections): Fetch connection info after adding non-retryable error listener COMPASS-9793 fix(connections): Fetch connection info after adding non-retryable error listener COMPASS-9793 Sep 11, 2025
@github-actions github-actions bot added the fix label Sep 11, 2025
@johnjackweir johnjackweir changed the title fix(connections): Fetch connection info after adding non-retryable error listener COMPASS-9793 fix(connections): disconnect when we encounter a non-retryable error code on an atlas connection COMPASS-9793 Sep 11, 2025
@johnjackweir johnjackweir added the no release notes Fix or feature not for release notes label Sep 11, 2025
Comment on lines 1774 to +1782
// pass it down to telemetry and instance model. This is a relatively
// expensive dataService operation so we're trying to keep the usage
// very limited
const instanceInfo = await dataService.instance();
Copy link
Collaborator

@gribnoysup gribnoysup Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either this should be before we store the dataService instance in a map, or we need to add explicit cleanup for it in the catch block below.

Suggested change
DataServiceForConnection.set(connectionInfo.id, dataService);
// We're trying to optimise the initial Compass loading times here: to
// make sure that the driver connection pool doesn't immediately get
// overwhelmed with requests, we fetch instance info only once and then
// pass it down to telemetry and instance model. This is a relatively
// expensive dataService operation so we're trying to keep the usage
// very limited
const instanceInfo = await dataService.instance();
// We're trying to optimise the initial Compass loading times here: to
// make sure that the driver connection pool doesn't immediately get
// overwhelmed with requests, we fetch instance info only once and then
// pass it down to telemetry and instance model. This is a relatively
// expensive dataService operation so we're trying to keep the usage
// very limited
const instanceInfo = await dataService.instance();
DataServiceForConnection.set(connectionInfo.id, dataService);

const instanceInfo = await dataService.instance();

let showedNonRetryableErrorToast = false;
// Listen for non-retry-able errors on failed server heartbeats.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we're planning to add an e2e test for this, but it probably wouldn't hurt to add a comment also

Suggested change
// Listen for non-retry-able errors on failed server heartbeats.
// NB: Order of operations is important here. Make sure that all events
// are attached BEFORE any other command is executed with dataService as
// connect method doesn't really guarantee that connection is fully
// established and these event listeners are important part of the
// connection flow.
// Listen for non-retry-able errors on failed server heartbeats.

Copy link
Collaborator

@gribnoysup gribnoysup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you have a chance to try this out locally? I'm looking at the event listener logic and have doubts now that it's enough to just move the instance call below to make everything fully work: from how it looks right now I'm guessing you'd see two error toasts, one of them will have a very cryptic error message about "client not created" or something along those line, I don't think it's an expected behavior

@gribnoysup
Copy link
Collaborator

I actually wonder now if this ever worked for initial connection properly, even before this change to instance fetching, dataService anyway wouldn't resolve in connect until driver already runs a bunch of operations and only then we attached the listeners, so for re-connect attempts this would work, but not for initial connection, maybe @Anemy knows better

@gribnoysup
Copy link
Collaborator

For example, connect with fail fast that compass uses otherwise had to be integrated inside the data explorer via shared devtools-connect logic to work during connect

@Anemy
Copy link
Member

Anemy commented Sep 12, 2025

@gribnoysup This didn't ever work for initial connections. When writing the initial implementation I was under the impression we were adding this in order to stop retries to databases that were already connected to (that's what led to this coming back up). As in a user is connected and then in the background deletes or pauses their cluster, or their role/session changes.

@gribnoysup
Copy link
Collaborator

Yeah, this makes sense, but then I guess this ticket is not for a regression fix, but for a "feature request" 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix no release notes Fix or feature not for release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants