-
Notifications
You must be signed in to change notification settings - Fork 64
[Java.Interop.Tools.JavaSource] Improve <code> parsing
#1125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fixes: #1071 The latest API docs update contained a couple dozen parsing issues due to broken `<code></code>` elements, reserved inline characters in `<code>` elements, and other issues. These issues have been fixed by no longer attempting to parse `<code>` elements with Irony. Instead, an HTML processing step has been added which replaces, removes, or decodes well known HTML tags after the javadoc is parsed. Parsing for `<a/>` elements has also been updated to fix all 83 cases where `href` attribute parsing would fail. Now when we we encounter an `<a/>` element that points to code or a local path we will only include the element value in the javadoc, and not the full `href` attribute. Readability of our generated docs should be improved by both of these changes, as there will be fewer encoded character entities in places where they are not necessary.
|
Testing further with: https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=7913417&view=results |
tools/generator/Java.Interop.Tools.Generator.ObjectModel/JavadocInfo.cs
Outdated
Show resolved
Hide resolved
tools/generator/Java.Interop.Tools.Generator.ObjectModel/JavadocInfo.cs
Outdated
Show resolved
Hide resolved
|
From the original PR message:
This concerns me, in that it means that there is no way to ensure valid XML. Imagine we have this Javadoc fragment: /**
* <em>this is never closed
*/
public static void foo() {
}then (I think; haven't tested) this would result in /// <i>this is never closed(A more common example would be The good news is that this doesn't error out, but it would result in a CS1570 warning: This would only be a problem for those building with …but is it otherwise a problem? I philosophically don't like it, but this might be the better path forward anyway… |
tools/generator/Java.Interop.Tools.Generator.ObjectModel/JavadocInfo.cs
Outdated
Show resolved
Hide resolved
|
From the PR message:
On the one hand, "yes", but on the other hand, whitespace is (normally) not significant, so if we look at xamarin/android-api-docs#49 and compare to what we have now, consider:
The Before:
After:
The "raw html" is absolute garbage, yes, but is the resulting text actually better? IMHO it is not better, as there is no longer a way to even know when the first sentence ends and the next sentence begins. ("Best", of course, would be adding Irony logic to look for |
tests/Java.Interop.Tools.JavaSource-Tests/SourceJavadocToXmldocParserTests.cs
Show resolved
Hide resolved
....Tools.JavaSource/Java.Interop.Tools.JavaSource/SourceJavadocToXmldocGrammar.HtmlBnfTerms.cs
Show resolved
Hide resolved
|
Overall, it may be better/simpler to separate the |
|
I moved the |
<code> parsing
|
It would be great to find a way to reduce the amount of encoded html entities in our docs to help with readability, but the attempt to do so as part of this PR was lacking. I've updated this to only include an update for |
| Assert.IsFalse (r.HasErrors (), DumpMessages (r, p)); | ||
| Assert.AreEqual ("<c>android:label=\"@string/resolve_title\"</c>", r.Root.AstNode.ToString ()); | ||
|
|
||
| r = p.Parse ("<code>Activity.RESULT_OK<code>"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to extend this test to see what happens with the content after the <code>, e.g. from android/telephony/gsm/SmsManager.java:
* The result code will be <code>Activity.RESULT_OK<code> for success,
* or one of these errors:
* <code>RESULT_ERROR_GENERIC_FAILURE</code>
or shorter: <code>should be code<code> but what about this <code>and this?</code>. Yes, <code>should be code<code> results in <c>should be code</c>, but what about the rest? Would we get:
<c>should be code</c><c> but what about this </c><c>and this?</c>
which is entirely defensible and reasonable, but is worth "calling out" in the unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't appear to create an additional code element for the intermediate content, presumably because we capture the incorrectly placed open <code> tag while parsing the previous code element. I've added another test for this.
|
WIP commit message: Fixes: https://github.com/xamarin/java.interop/issues/1071
The latest API docs update contained a couple dozen parsing issues
due to `<code/>` parsing, including:
* Closing element doesn't match opening element: `<code>null</null>`
* Content including `@`: `<code>android:label="@string/resolve_title"</code>`
* Closing element is actually an opening element:
`<code>Activity.RESULT_OK<code>`
* Improper element nesting: `<code><pre><p>content</code></pre></p>`
* Use of attributes: `<code class=prettyprint>content<code>`
Fix this by replacing `CodeElementDeclaration` to use a new
`CodeElementContentTerm` terminal, which is a "greedy regex" which
grabs `<code` until one of:
* `</code>`
* `</null>`
* `<code>` |
Fixes: dotnet/java-interop#1071 Context: dotnet/java-interop@d0231c5 Changes: dotnet/java-interop@e1121ea...151b03e * dotnet/java-interop@151b03ee: [Java.Interop.Tools.JavaSource] Improve `<code>` parsing (dotnet/java-interop#1125) * dotnet/java-interop@6a9f5cbb: [Java.Interop.Tools.JavaSource] Improve `<a>` parsing (dotnet/java-interop#1126) * dotnet/java-interop@d0231c5c: [generator] Override methods should match base deprecated info (dotnet/java-interop#1130) Commit dotnet/java-interop@d0231c5c updated `override` methods' "obsolete" data to match its base method. This results in several `[ObsoleteOSPlatform]` attributes being switched to `[Obsolete]` attributes. Updated `acceptable-breakages` to allow this. Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jonathan Pobst <[email protected]>
Fixes: #1071
The latest API docs update contained a couple dozen parsing issues due
to a handful of issues parsing
<code>elements such as:These issues have been fixed by attempting to capture entire
<code>elements (and the known issues mentioned above) using a Regex based
terminal.