Skip to content

Conversation

@ethan-kusters
Copy link
Contributor

@ethan-kusters ethan-kusters commented Apr 20, 2022

Bug/issue #, if applicable: rdar://92131596

Summary

The caching mechanism in ResolvedTopicReference never releases references to the ResolvedTopicReferences it holds.

This has likely always been a problem since the introduction of the cache, but is now appearing as a memory leak when profiling SwiftDocC because we recently added copy-on-write functionality to ResolvedTopicReference with #47 .

It’s important to release unused ResolvedTopicReferences because we can create potentially tens of thousands of them when compiling large documentation catalogs or loading large navigator indexes.

Testing

Profile DocC after converting a documentation catalog and confirm that there are no references to unused ResolvedTopicReferences.

Checklist

Make sure you check off the following items. If they cannot be completed, provide a reason.

  • Added tests
  • Ran the ./bin/test script and it succeeded
  • Updated documentation if necessary

The caching mechanism in `ResolvedTopicReference` never releases
references to the `ResolvedTopicReference`s it holds.

This has likely always been a problem since the introduction of the
cache, but is just now appearing as a memory leak when profiling SwiftDocC
because we recently added copy-on-write functionality to
`ResolvedTopicReference`.

It’s important to release unused `ResolvedTopicReference`s because
We can create potentially tens of thousands of them when compiling large
documentation catalogs or loading large navigator indexes.

Resolves rdar://86035019.
@ethan-kusters
Copy link
Contributor Author

@swift-ci please test

#endif

// Verify there is no pool bucket for the bundle we're about to test
XCTAssertNil(ResolvedTopicReference.sharedPool.sync({ $0[#function] }))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just duplicates the below test but without the addition of holding onto the bundle and context since the below test started failing with this change.

@ethan-kusters
Copy link
Contributor Author

This may have some negative performance impact but I haven't had a chance to run performance tests yet.

*/

/// A wrapper that provides weak ownership of a value.
struct Weak<T: AnyObject> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any mechanism that drains the shared pool or otherwise removes unused elements? If not, I think we'll still end up with a shared pool full of empty weak boxes.

Copy link
Contributor Author

@ethan-kusters ethan-kusters Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was worried about that too, but the added logic in the deinit of the Storage should handle this.

The test that asserts on the nil cache pool after converting a bundle should cover this.

}

topicReference = nil
XCTAssertNil(topicReference)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this will always hold true, regardless of the changes introduced by this PR. What are we actually trying to test here? That the underlying storage is no longer being referenced?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Copy link
Contributor Author

@ethan-kusters ethan-kusters Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just there to add a use of topicReference so that Swift doesn't throw a warning and/or potentially optimize this all out. I'll add a comment to clarify that.

@franklinsch
Copy link
Contributor

franklinsch commented Apr 20, 2022

It would also be interesting to see whether the cache is still useful at all (performance wise), given that ResolvedTopicReference is now copy-on-write.

We need to use a unique bundle identifier when testing the shared cache pool.
Otherwise, when running test sequentially (in the same process, unlike what
what happens when tests are run in parallel), we end up with collisions in
the caching and flakey tests.
@ethan-kusters
Copy link
Contributor Author

@swift-ci please test

@ethan-kusters
Copy link
Contributor Author

So we do see a significant performance hit here:

+-----------------------------------------------------------------------------------------------------+
| Metric                                   | Change     | Before               | After                |
+-----------------------------------------------------------------------------------------------------+
| Duration for 'bundle-registration' (sec) | +8.84%     | 6.21                 | 6.76                 |
| Duration for 'convert-action' (sec)      | +6.87%     | 8.79                 | 9.39                 |
| Duration for 'navigation-index' (sec)    | +3.57%     | 0.03                 | 0.03                 |
| Peak memory footprint (MB)               | +0.62%     | 358.24               | 360.47               |
| Data subdirectory size (MB)              | no change  | 114.85               | 114.85               |
| Index subdirectory size (MB)             | no change  | 0.95                 | 0.95                 |
| Total DocC archive size (MB)             | no change  | 150.61               | 150.61               |
| Topic Anchor Checksum                    | no change  | c8f03d4e81da8e3b6d7f | c8f03d4e81da8e3b6d7f |
| Topic Graph Checksum                     | no change  | edce22bce4f8de475a5e | edce22bce4f8de475a5e |
+-----------------------------------------------------------------------------------------------------+
TestFramework-25.json
+-----------------------------------------------------------------------------------------------------+
| Metric                                   | Change     | Before               | After                |
+-----------------------------------------------------------------------------------------------------+
| Duration for 'bundle-registration' (sec) | +6.55%     | 16.12                | 17.17                |
| Duration for 'convert-action' (sec)      | +4.71%     | 22.82                | 23.89                |
| Duration for 'navigation-index' (sec)    | +4.17%     | 0.07                 | 0.07                 |
| Peak memory footprint (MB)               | -3.65%     | 856.14               | 824.87               |
| Data subdirectory size (MB)              | no change  | 287.95               | 287.95               |
| Index subdirectory size (MB)             | no change  | 2.41                 | 2.41                 |
| Total DocC archive size (MB)             | no change  | 376.17               | 376.17               |
| Topic Anchor Checksum                    | no change  | 9a675b9ad6d69f8b7f0c | 9a675b9ad6d69f8b7f0c |
| Topic Graph Checksum                     | no change  | 665713509084b5131a37 | 665713509084b5131a37 |
+-----------------------------------------------------------------------------------------------------+
TestFramework-5.json
+-----------------------------------------------------------------------------------------------------+
| Metric                                   | Change     | Before               | After                |
+-----------------------------------------------------------------------------------------------------+
| Duration for 'bundle-registration' (sec) | +13.78%    | 2.99                 | 3.40                 |
| Duration for 'convert-action' (sec)      | +10.30%    | 4.26                 | 4.70                 |
| Duration for 'navigation-index' (sec)    | no change  | 0.01                 | 0.01                 |
| Peak memory footprint (MB)               | -6.60%     | 211.74               | 197.77               |
| Data subdirectory size (MB)              | no change  | 57.40                | 57.40                |
| Index subdirectory size (MB)             | no change  | 0.47                 | 0.47                 |
| Total DocC archive size (MB)             | no change  | 75.69                | 75.69                |
| Topic Anchor Checksum                    | no change  | 65d4ff3050cd1106a21b | 65d4ff3050cd1106a21b |
| Topic Graph Checksum                     | no change  | 0a64c740a2ed5a246836 | 0a64c740a2ed5a246836 |
+-----------------------------------------------------------------------------------------------------+
TestFramework-50.json
+-----------------------------------------------------------------------------------------------------+
| Metric                                   | Change     | Before               | After                |
+-----------------------------------------------------------------------------------------------------+
| Duration for 'bundle-registration' (sec) | +15.23%    | 30.48                | 35.13                |
| Duration for 'convert-action' (sec)      | +10.51%    | 44.58                | 49.26                |
| Duration for 'navigation-index' (sec)    | +5.56%     | 0.14                 | 0.15                 |
| Peak memory footprint (MB)               | -4.37%     | 1612.16              | 1541.68              |
| Data subdirectory size (MB)              | no change  | 579.59               | 579.59               |
| Index subdirectory size (MB)             | no change  | 4.84                 | 4.84                 |
| Total DocC archive size (MB)             | no change  | 755.25               | 755.25               |
| Topic Anchor Checksum                    | no change  | 08d0a84bec913460905f | 08d0a84bec913460905f |
| Topic Graph Checksum                     | no change  | 74d25c4f1ee185ad5676 | 74d25c4f1ee185ad5676 |
+-----------------------------------------------------------------------------------------------------+

@franklinsch
Copy link
Contributor

Yes, with this change, we get about 50% more cache misses for these references, according to logs I added locally.

@daniel-grumberg
Copy link
Contributor

I assume we get increased cache misses because as soon as a topic reference goes out of scope, we can't use a cached value instead. I assume that topic references that are still in scope somewhere could be created by copying them directly with the same benefits as using the cache. Maybe the cache should just be a LRU cache?

@ethan-kusters
Copy link
Contributor Author

Closing this in favor of

which is a lower-risk and more targeted fix for this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants