Skip to content

Conversation

@stefanosiano
Copy link
Member

@stefanosiano stefanosiano commented Oct 23, 2025

📜 Description

AndroidConnectionStatusProvider cache is now updated in the background, so lock is not acquired indefinitely

💡 Motivation and Context

There are a lot of ANRs in our Play console and in our own SDKCD tool, that report ANRs due to a lock in AndroidConnectionStatusProvider.
We were doing some IPC calls inside the lock, which may cause the ANR:

  • connectivityManager.getActiveNetwork()
  • connectivityManager.getNetworkCapabilities(activeNetwork)

The problem happens when the app goes to the foreground, as our AndroidConnectionStatusProvider onForeground is called, which runs updateCache in the background. At the same time, though, the ReplayIntegration runs updateCache on the main thread at the same moment (when the app goes to the foreground).
The ANR happens because of the mainThread awaiting the lock (it shouldn't happen, but maybe the IPC calls done in updateCache are the culprit)

💚 How did you test it?

📝 Checklist

  • I added GH Issue ID & Linear ID
  • I added tests to verify the changes.
  • No new PII added or SDK only sends newly added PII if sendDefaultPII is enabled.
  • I updated the docs if needed.
  • I updated the wizard if needed.
  • Review from the native team if needed.
  • No breaking change or entry added to the changelog.
  • No breaking change for hybrid SDKs or communicated to hybrid SDKs.

🔮 Next steps

@stefanosiano stefanosiano marked this pull request as ready for review October 23, 2025 15:29
@cursor
Copy link

cursor bot commented Oct 23, 2025

Bug: Unprotected Cache Update Causes Data Race

The cachedNetworkCapabilities and lastCacheUpdateTime fields are updated without lock protection in updateCacheAndNotifyObservers() and the error path of updateCache(). This creates a data race, as other methods read these shared fields under lock, potentially leading to inconsistent state.

Additional Locations (1)

Fix in Cursor Fix in Web

@cursor
Copy link

cursor bot commented Oct 23, 2025

Bug: Race Condition in Network Capability Cache

The updateCache() method clears the cached network capabilities and updates its timestamp, then releases its lock before the actual network query runs asynchronously. This creates a window where other threads read an empty cache that appears valid, leading to incorrect connection status. Additionally, concurrent calls can submit multiple background tasks that race to update the shared state.

Fix in Cursor Fix in Web

@github-actions
Copy link
Contributor

github-actions bot commented Oct 23, 2025

Performance metrics 🚀

  Plain With Sentry Diff
Startup time 316.48 ms 362.29 ms 45.81 ms
Size 1.58 MiB 2.12 MiB 549.40 KiB

Baseline results on branch: main

Startup times

Revision Plain With Sentry Diff
d364ace 382.77 ms 443.21 ms 60.44 ms
3998a95 415.94 ms 478.54 ms 62.60 ms
806307f 357.85 ms 424.64 ms 66.79 ms
ee747ae 357.79 ms 421.84 ms 64.05 ms
d217708 355.34 ms 381.39 ms 26.05 ms
ee747ae 396.82 ms 441.67 ms 44.86 ms
604a261 380.65 ms 451.27 ms 70.62 ms
3699cd5 423.60 ms 495.52 ms 71.92 ms
b3d8889 371.84 ms 447.49 ms 75.65 ms
d5a29b6 298.62 ms 391.78 ms 93.16 ms

App size

Revision Plain With Sentry Diff
d364ace 1.58 MiB 2.11 MiB 539.75 KiB
3998a95 1.58 MiB 2.10 MiB 532.96 KiB
806307f 1.58 MiB 2.10 MiB 533.42 KiB
ee747ae 1.58 MiB 2.10 MiB 530.95 KiB
d217708 1.58 MiB 2.10 MiB 532.97 KiB
ee747ae 1.58 MiB 2.10 MiB 530.95 KiB
604a261 1.58 MiB 2.10 MiB 533.42 KiB
3699cd5 1.58 MiB 2.10 MiB 533.45 KiB
b3d8889 1.58 MiB 2.10 MiB 535.07 KiB
d5a29b6 1.58 MiB 2.12 MiB 549.37 KiB

Previous results on branch: stefanosiano/fix/anr-connection-status

Startup times

Revision Plain With Sentry Diff
f208591 320.37 ms 351.59 ms 31.22 ms

App size

Revision Plain With Sentry Diff
f208591 1.58 MiB 2.12 MiB 549.41 KiB

avoid concurrent cache updates
@cursor
Copy link

cursor bot commented Oct 23, 2025

Bug: Cache Update Flag Stuck on Exception

The isUpdatingCache flag, set to true in the background cache update task, isn't reset if an exception occurs. This leaves the flag true, permanently blocking all subsequent cache updates.

Fix in Cursor Fix in Web

@cursor
Copy link

cursor bot commented Oct 23, 2025

Bug: Cache Update Timing Causes Stale Data

The updateCache() method clears the cache and updates its validity timestamp synchronously, but fetches network capabilities asynchronously. This causes getConnectionStatus() and getConnectionType() to return stale or null data, as the cache is marked valid before the background update completes.

Fix in Cursor Fix in Web

@cursor
Copy link

cursor bot commented Oct 23, 2025

Bug: Cache Update Race Condition

A race condition allows updateCacheAndNotifyObservers to directly update cachedNetworkCapabilities and lastCacheUpdateTime concurrently with a background task from updateCache. This leads to uncoordinated writes and an inconsistent cache state. The isUpdatingCache flag doesn't prevent this, and the direct update in updateCacheAndNotifyObservers defeats the PR's goal of moving IPC calls out of the lock.

Fix in Cursor Fix in Web

@cursor
Copy link

cursor bot commented Oct 23, 2025

Bug: Cache Update Lock Exception Handling

The isUpdatingCache flag might not reset if an exception occurs during the cache update, such as when acquiring the lock. This can permanently block all future connection status cache updates.

Fix in Cursor Fix in Web

@cursor
Copy link

cursor bot commented Oct 23, 2025

Bug: Cache Update Timing and Error Handling

The updateCache() method synchronously updates the cache's timestamp and clears its contents, but the actual network capabilities are fetched asynchronously. This can cause getConnectionStatus() to return stale or incorrect data immediately after an update, including in onForeground(). If the asynchronous update fails, the isUpdatingCache flag isn't reset, permanently blocking future cache updates.

Additional Locations (1)

Fix in Cursor Fix in Web

avoid concurrent cache updates
cursor[bot]

This comment was marked as outdated.

avoid concurrent cache updates
options.getLogger().log(SentryLevel.WARNING, "Failed to update connection status cache", t);
} catch (Throwable t) {
options.getLogger().log(SentryLevel.WARNING, "Failed to update connection status cache", t);
try (final @NotNull ISentryLifecycleToken ignored = lock.acquire()) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Cache Update Timing Causes Stale Data

updateCache() now clears the cache and updates its timestamp before asynchronously fetching network capabilities. This creates a race condition where immediate reads (e.g., in onForeground(), getConnectionStatus()) get stale or null data, and isCacheValid() incorrectly reports the cleared cache as valid.

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants