Skip to content

Conversation

@ethan-kusters
Copy link
Contributor

@ethan-kusters ethan-kusters commented Dec 1, 2021

Bug/issue #, if applicable: rdar://84169407

Summary

Adds base support for converting DocC Catalogs that include multiple languages by expanding symbol semantic models to hold language-specific variants.

This builds on previous work landed in:

Dependencies

None.

Testing

Build documentation for the included test bundle at Tests/SwiftDocCTests/Test\ Bundles/MixedLanguageFramework.docc and confirm that output is expected.

Steps:

  1. export DOCC_HTML_DIR=/path/to/recent/swift-docc-render-artifact
  2. From the root of this repository:
    swift run docc preview Tests/SwiftDocCTests/Test\ Bundles/MixedLanguageFramework.docc \
        --enable-experimental-objective-c-support \
        --index
  3. Confirm that mixed-language content renders as expected.
Swift Objective-C
Screen Shot 2021-11-30 at 7 30 29 PM Screen Shot 2021-11-30 at 7 30 36 PM

Checklist

Make sure you check off the following items. If they cannot be completed, provide a reason.

  • Added tests
  • Ran the ./bin/test script and it succeeded
  • Updated documentation if necessary

@ethan-kusters ethan-kusters marked this pull request as draft December 1, 2021 03:26
@ethan-kusters ethan-kusters self-assigned this Dec 1, 2021
@ethan-kusters ethan-kusters marked this pull request as ready for review December 3, 2021 00:14
@ethan-kusters
Copy link
Contributor Author

@swift-ci please test

Copy link
Contributor

@franklinsch franklinsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great! Appreciate the refactoring along the way and testing!

} else {
value = try! encoder.encode(CodableRenderReference.init(reference))
referenceCache.sync({ $0[cacheKeyWithConformance] = value })
encoderUserInfoVariantOverrides.values.removeAll()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow this, why do we need to clear this property?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right- so we only want the variant overrides that are collected while encoding that single reference. Since the cache contains the reference and the overrides for that reference.

But this is definitely confusing- I'll add a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed here: 5ef11e1.

Thanks!


// We only want to check for an objective-c variant
// if we're currently indexing a swift variant.
guard language == .swift else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this guard needed? In what cases do we have render nodes that have Objective-C variants that we don't want to index?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is... I'd definitely be open to suggestions for how to improve this but otherwise we end up in an infinite loop since an objective-c symbol has an objective-c variant.

return directive
}
diagnosticEngine.emit(Problem(diagnostic: Diagnostic(source: found.source, severity: .warning, range: directive?.range, identifier: "org.swift.docc.DeprecationSummaryForAvailableSymbol", summary: "\(symbol.absolutePath.singleQuoted) isn't unconditionally deprecated"), possibleSolutions: []))
let symbol = updatedNode.symbol
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be using the unifiedSymbol here instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed here: e03f383.

Thanks!

let moduleInterfaceLanguages: Set<SourceLanguage>
if FeatureFlags.current.isExperimentalObjectiveCSupportEnabled {
// Infer the module's interface languages from the interface
// languages of the first symbol in the symbol graph.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment accurate? Also, do we have a ticket we can link to for SymbolKit to provide an API that gives this information?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right- that's out-of-date. I've updated it and linked a ticket.

Resolved here: beb3a14.

///
/// - Parameters:
/// - pathComponents: The relative path components from the module or framework to the symbol.
/// - interfaceLanguage: The source language of the symbol.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs updating

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Addressed here: 5c8b48d.

}

/// Returns the primary symbol to use as documentation source.
var documentedSymbol: SymbolGraph.Symbol? {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API seems a bit odd to me. If it's the doc comment we're interested in, should the API not just return the doc comment rather than the whole symbol?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally this was just what caused the least code churn. But I think this API makes sense long-term as well. There are cases when we need all the metadata that comes with a symbol, but specifically the documented one.

For example, we might want to attach a diagnostic to the documented version of a symbol and needs it source location.

"minor" : 5,
"patch" : 3
},
"generator" : "SymbolKit"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this should probably be clang

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


/// A test case that enables the experimental Objective-C support feature flag
/// before running.
class ExperimentalObjectiveCTestCase: XCTestCase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

/// before running.
class ExperimentalObjectiveCTestCase: XCTestCase {
override func setUp() {
enableFeatureFlag(\.isExperimentalObjectiveCSupportEnabled)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should generally call super.setUp() here in case the base class has some set up behavior (although I doubt XCTestCase does)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call- 92935cb.

"minor" : 5,
"patch" : 3
},
"generator" : "SymbolKit"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ethan-kusters
Copy link
Contributor Author

@swift-ci please test

@ethan-kusters
Copy link
Contributor Author

ethan-kusters commented Dec 8, 2021

We were initially seeing a significant performance regression in this PR but @franklinsch and I have found some other areas of the codebase that could be optimized to negate the regression.

We're now seeing a significant performance improvement of around 20%.

TestFramework-5.json
+-----------------------------------------------------------------------------------------------------+
| Metric                                   | Change     | Before               | After                |
+-----------------------------------------------------------------------------------------------------+
| Compiled output size (MB)                | no change  | 57.37                | 57.37                |
| Duration for 'bundle-registration' (sec) | -21.40%    | 3.89                 | 3.06                 |
| Duration for 'convert-action' (sec)      | -17.24%    | 4.96                 | 4.11                 |
| Peak memory footprint (MB)               | -8.39%     | 221.56               | 202.97               |
| Topic Anchor Checksum                    | no change  | 65d4ff3050cd1106a21b | 65d4ff3050cd1106a21b |
| Topic Graph Checksum                     | no change  | 0a64c740a2ed5a246836 | 0a64c740a2ed5a246836 |
+------------------------------------------------------------------------------------------


TestFramework-10.json
+-----------------------------------------------------------------------------------------------------+
| Metric                                   | Change     | Before               | After                |
+-----------------------------------------------------------------------------------------------------+
| Compiled output size (MB)                | no change  | 114.71               | 114.71               |
| Duration for 'bundle-registration' (sec) | -21.25%    | 7.81                 | 6.15                 |
| Duration for 'convert-action' (sec)      | -16.63%    | 9.96                 | 8.30                 |
| Peak memory footprint (MB)               | -8.02%     | 385.14               | 354.24               |
| Topic Anchor Checksum                    | no change  | c8f03d4e81da8e3b6d7f | c8f03d4e81da8e3b6d7f |
| Topic Graph Checksum                     | no change  | edce22bce4f8de475a5e | edce22bce4f8de475a5e |
+-----------------------------------------------------------------------------------------------------+

TestFramework-25.json
+-----------------------------------------------------------------------------------------------------+
| Metric                                   | Change     | Before               | After                |
+-----------------------------------------------------------------------------------------------------+
| Compiled output size (MB)                | no change  | 287.64               | 287.64               |
| Duration for 'bundle-registration' (sec) | -20.37%    | 19.81                | 15.77                |
| Duration for 'convert-action' (sec)      | -17.11%    | 25.92                | 21.48                |
| Peak memory footprint (MB)               | -4.52%     | 880.44               | 840.62               |
| Topic Anchor Checksum                    | no change  | 9a675b9ad6d69f8b7f0c | 9a675b9ad6d69f8b7f0c |
| Topic Graph Checksum                     | no change  | 665713509084b5131a37 | 665713509084b5131a37 |
+-----------------------------------------------------------------------------------------------------+

TestFramework-50.json
+-----------------------------------------------------------------------------------------------------+
| Metric                                   | Change     | Before               | After                |
+-----------------------------------------------------------------------------------------------------+
| Compiled output size (MB)                | no change  | 579.70               | 579.70               |
| Duration for 'bundle-registration' (sec) | -21.23%    | 39.75                | 31.31                |
| Duration for 'convert-action' (sec)      | -16.78%    | 51.46                | 42.82                |
| Peak memory footprint (MB)               | -7.88%     | 1678.05              | 1545.80              |
| Topic Anchor Checksum                    | no change  | 08d0a84bec913460905f | 08d0a84bec913460905f |
| Topic Graph Checksum                     | no change  | 74d25c4f1ee185ad5676 | 74d25c4f1ee185ad5676 |
+-----------------------------------------------------------------------------------------------------+

The majority of this benefit comes from implementing copy-on-write semantics for the ResolvedTopicReference type: 9d4106b.

Lazy initialization of the expensive URL property within ResolvedTopicReference added another slight performance gain: 3f1b105

@ethan-kusters
Copy link
Contributor Author

@swift-ci please test

Copy link
Contributor

@franklinsch franklinsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I think we should reconsider making ResolvedTopicReference immutable though, and split out further changes in a separate PR.

/// The identifier of the bundle that owns this documentation topic.
public var bundleIdentifier: String {
didSet { updateURL() }
return _storage.bundleIdentifier
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What performance gain do we get by making this type immutable? Thinking about this more, I prefer the flexibility of keeping this type mutable. It allows for clients to easily mutate properties of this type without calling the initializer and configure all the properties again. If making it immutable provides a significant performance gain, I think we should look at alternatives to improve performance in the future. What do you think?

Also, as it stands, this type is not copy-on-write, because it cannot be written to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this more, I prefer the flexibility of keeping this type mutable. It allows for clients to easily mutate properties of this type without calling the initializer and configure all the properties again.

I think this is a good point and in general I agree. However, because ResolvedTopicReference is effectively a wrapper around URL I think it's important to expose a similar API to URL and, thus, communicate with clients that doing something like adding a path component is an expensive operation.

Prior to this, we exposed the pathComponents property directly and then monitored it to recalculate the URL when it was modified. This is likely surprising to clients who would not expect the direct modification of an array of strings to be an expensive operation. Limiting clients to using the addingPathComponent(_:) style api communicates better what's actually happening here.

I've added further documentation to ResolvedTopicReference in a commit here (5c78d85) to clarify this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great argument. The fact that it's a wrapper around URL, it makes sense to convey similar characteristics to URL, which includes immutability. The alternative here would've been to keep it mutable and document the properties that incur a performance cost when written to, but I do prefer the immutable approach now. Thanks for adding the doc comment!

}

public init(bundleIdentifier: String, path: String, fragment: String? = nil, sourceLanguages: Set<SourceLanguage>) {
self.init(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beyond the copy-on-write change, which by itself provides a net gain in performance, I would suggest keeping the changes in this type to separate PR to manage risk (e.g., if we need to revert this PR for whatever reason, the additional improvements would not be reverted).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. I've rebased the commits in this PR to 5 isolated commits and will plan on a rebase and merge instead of a squash so that we can revert individual parts of this PR if necessary.

ethan-kusters and others added 5 commits December 9, 2021 10:18
Adds base support for converting DocC Catalogs that include multiple languages by expanding symbol semantic models to hold language-specific variants.

Resolves rdar://84169407

Co-authored-by: Franklin Schrans <[email protected]>
We add ResolvedTopicReferences to the cache when initializing a new reference so we don't need these additional calls.
We don't always use the URL property on a resolvedtopicreference and its expensive to compute, so it makes sense to make this a lazy initialization.
Because `urlReadablePath` was always call in the topic reference's initializer, when adding and removing path components, we were performing duplicate work.

This adds a new private initializer that skips the `urlReadablePath` call when we know we already have an escaped path.
@ethan-kusters
Copy link
Contributor Author

@swift-ci please test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants