fix(connections): disconnect when we encounter a non-retryable error code on an atlas connection COMPASS-9793 #7316

johnjackweir · 2025-09-11T20:23:37Z

I'll add followup e2es as a separate PR but I don't want to block the fix on figuring those out

…listener

gribnoysup · 2025-09-12T06:22:40Z

packages/compass-connections/src/stores/connections-store-redux.ts

+        // pass it down to telemetry and instance model. This is a relatively
+        // expensive dataService operation so we're trying to keep the usage
+        // very limited
+        const instanceInfo = await dataService.instance();


Either this should be before we store the dataService instance in a map, or we need to add explicit cleanup for it in the catch block below.

Suggested change

DataServiceForConnection.set(connectionInfo.id, dataService);

// We're trying to optimise the initial Compass loading times here: to

// make sure that the driver connection pool doesn't immediately get

// overwhelmed with requests, we fetch instance info only once and then

// pass it down to telemetry and instance model. This is a relatively

// expensive dataService operation so we're trying to keep the usage

// very limited

const instanceInfo = await dataService.instance();

// We're trying to optimise the initial Compass loading times here: to

// make sure that the driver connection pool doesn't immediately get

// overwhelmed with requests, we fetch instance info only once and then

// pass it down to telemetry and instance model. This is a relatively

// expensive dataService operation so we're trying to keep the usage

// very limited

const instanceInfo = await dataService.instance();

DataServiceForConnection.set(connectionInfo.id, dataService);

gribnoysup · 2025-09-12T08:23:14Z

packages/compass-connections/src/stores/connections-store-redux.ts

-        const instanceInfo = await dataService.instance();
-
        let showedNonRetryableErrorToast = false;
        // Listen for non-retry-able errors on failed server heartbeats.


I know we're planning to add an e2e test for this, but it probably wouldn't hurt to add a comment also

Suggested change

// Listen for non-retry-able errors on failed server heartbeats.

// NB: Order of operations is important here. Make sure that all events

// are attached BEFORE any other command is executed with dataService as

// connect method doesn't really guarantee that connection is fully

// established and these event listeners are important part of the

// connection flow.

// Listen for non-retry-able errors on failed server heartbeats.

gribnoysup

Did you have a chance to try this out locally? I'm looking at the event listener logic and have doubts now that it's enough to just move the instance call below to make everything fully work: from how it looks right now I'm guessing you'd see two error toasts, one of them will have a very cryptic error message about "client not created" or something along those line, I don't think it's an expected behavior

gribnoysup · 2025-09-12T08:40:20Z

I actually wonder now if this ever worked for initial connection properly, even before this change to instance fetching, dataService anyway wouldn't resolve in connect until driver already runs a bunch of operations and only then we attached the listeners, so for re-connect attempts this would work, but not for initial connection, maybe @Anemy knows better

gribnoysup · 2025-09-12T08:53:06Z

For example, connect with fail fast that compass uses otherwise had to be integrated inside the data explorer via shared devtools-connect logic to work during connect

Anemy · 2025-09-12T14:28:08Z

@gribnoysup This didn't ever work for initial connections. When writing the initial implementation I was under the impression we were adding this in order to stop retries to databases that were already connected to (that's what led to this coming back up). As in a user is connected and then in the background deletes or pauses their cluster, or their role/session changes.

gribnoysup · 2025-09-12T15:11:20Z

Yeah, this makes sense, but then I guess this ticket is not for a regression fix, but for a "feature request" 😄

COMPASS-9793: fetch connection info after adding non-retryable error …

1e9c100

…listener

johnjackweir requested a review from a team as a code owner September 11, 2025 20:23

johnjackweir requested a review from Anemy September 11, 2025 20:23

johnjackweir changed the title ~~COMPASS-9793: Fetch connection info after adding non-retryable error listener~~ bug(connections): Fetch connection info after adding non-retryable error listener COMPASS-9793 Sep 11, 2025

johnjackweir changed the title ~~bug(connections): Fetch connection info after adding non-retryable error listener COMPASS-9793~~ fix(connections): Fetch connection info after adding non-retryable error listener COMPASS-9793 Sep 11, 2025

github-actions bot added the fix label Sep 11, 2025

johnjackweir changed the title ~~fix(connections): Fetch connection info after adding non-retryable error listener COMPASS-9793~~ fix(connections): disconnect when we encounter a non-retryable error code on an atlas connection COMPASS-9793 Sep 11, 2025

johnjackweir added the no release notes Fix or feature not for release notes label Sep 11, 2025

COMPASS-9793: fetch connection info after all listeners

4a3a5ed

gribnoysup reviewed Sep 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(connections): disconnect when we encounter a non-retryable error code on an atlas connection COMPASS-9793 #7316

fix(connections): disconnect when we encounter a non-retryable error code on an atlas connection COMPASS-9793 #7316

Uh oh!

johnjackweir commented Sep 11, 2025 •

edited

Loading

Uh oh!

gribnoysup Sep 12, 2025 •

edited

Loading

Uh oh!

gribnoysup Sep 12, 2025

Uh oh!

gribnoysup left a comment

Uh oh!

gribnoysup commented Sep 12, 2025

Uh oh!

gribnoysup commented Sep 12, 2025

Uh oh!

Anemy commented Sep 12, 2025

Uh oh!

gribnoysup commented Sep 12, 2025

Uh oh!

Uh oh!

-        // Listen for non-retry-able errors on failed server heartbeats.
+        // NB: Order of operations is important here. Make sure that all events
+        // are attached BEFORE any other command is executed with dataService as
+        // connect method doesn't really guarantee that connection is fully
+        // established and these event listeners are important part of the
+        // connection flow.
+        // Listen for non-retry-able errors on failed server heartbeats.

fix(connections): disconnect when we encounter a non-retryable error code on an atlas connection COMPASS-9793 #7316

Are you sure you want to change the base?

fix(connections): disconnect when we encounter a non-retryable error code on an atlas connection COMPASS-9793 #7316

Uh oh!

Conversation

johnjackweir commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gribnoysup Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gribnoysup Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

gribnoysup left a comment

Choose a reason for hiding this comment

Uh oh!

gribnoysup commented Sep 12, 2025

Uh oh!

gribnoysup commented Sep 12, 2025

Uh oh!

Anemy commented Sep 12, 2025

Uh oh!

gribnoysup commented Sep 12, 2025

Uh oh!

Uh oh!

johnjackweir commented Sep 11, 2025 •

edited

Loading

gribnoysup Sep 12, 2025 •

edited

Loading