Skip to content

Conversation

@pjcollins
Copy link
Member

@pjcollins pjcollins commented Jun 13, 2023

Fixes: #1071

The latest API docs update contained a couple dozen parsing issues due
to a handful of issues parsing <code> elements such as:

<code>null</null> - wrong closing tag
<code>android:label="@string/resolve_title"</code> - @ tag rule seemingly taking precedence
<code>Activity.RESULT_OK<code> - trailing open tag rather than closing tag
<code><pre><p>content</code></pre></p> - additional inline html tags
<code class=prettyprint>content<code> - attributes on the code element

These issues have been fixed by attempting to capture entire <code>
elements (and the known issues mentioned above) using a Regex based
terminal.

Fixes: #1071

The latest API docs update contained a couple dozen parsing issues due
to broken `<code></code>` elements, reserved inline characters in
`<code>` elements, and other issues.  These issues have been fixed by
no longer attempting to parse `<code>` elements with Irony.  Instead, an
HTML processing step has been added which replaces, removes, or decodes
well known HTML tags after the javadoc is parsed.

Parsing for `<a/>` elements has also been updated to fix all 83 cases
where `href` attribute parsing would fail.  Now when we we encounter an
`<a/>` element that points to code or a local path we will only include
the element value in the javadoc, and not the full `href` attribute.

Readability of our generated docs should be improved by both of these
changes, as there will be fewer encoded character entities in places
where they are not necessary.
@pjcollins
Copy link
Member Author

Testing further with: https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=7913417&view=results
Sample diff for these changes can be found at xamarin/android-api-docs#49.

@pjcollins pjcollins marked this pull request as ready for review June 14, 2023 00:17
@pjcollins pjcollins requested a review from jonpryor June 14, 2023 18:18
@jonpryor
Copy link
Contributor

From the original PR message:

Instead, an HTML [post-]processing step has been added which replaces, removes, or decodes well known HTML tags after the javadoc is parsed.

This concerns me, in that it means that there is no way to ensure valid XML. Imagine we have this Javadoc fragment:

/**
 * <em>this is never closed
 */
public static void foo() {
}

then (I think; haven't tested) this would result in FixUpHtml() returning "<i>this is never closed", which could eventually result in:

/// <i>this is never closed

(A more common example would be <li> usage, for which there is almost never a corresponding </li>. See e.g. java/text/Collator.java and search for <LI>.)

The good news is that this doesn't error out, but it would result in a CS1570 warning:

warning CS1570: XML comment has badly formed XML -- 'Expected an end tag for element 'i'.'

This would only be a problem for those building with $(TreatWarningsAsErrors)=True (which is bananas for binding projects)…

…but is it otherwise a problem? I philosophically don't like it, but this might be the better path forward anyway

@jonpryor
Copy link
Contributor

From the PR message:

Readability of our generated docs should be improved by both of these changes, as there will be fewer encoded character entities in places where they are not necessary.

On the one hand, "yes", but on the other hand, whitespace is (normally) not significant, so if we look at xamarin/android-api-docs#49 and compare to what we have now, consider:

The &lt;ul&gt; & co is replaced, removing the HTML. In the diff, this looks nicer. In context?

Before:

This API can be called by the following to generate a new private/public key pair: <ul> <li>Device owner</li> <li>Profile owner</li> <li>Delegated certificate installer</li> <li>Credential management app</li> </ul> If the device supports key generation via secure hardware…

After:

This API can be called by the following to generate a new private/public key pair: Device owner Profile owner Delegated certificate installer Credential management app If the device supports key generation via secure hardware

The "raw html" is absolute garbage, yes, but is the resulting text actually better? IMHO it is not better, as there is no longer a way to even know when the first sentence ends and the next sentence begins. </ul> If the device… is ugly, certainly, but at least you know "If the device" begins a new sentence.

("Best", of course, would be adding Irony logic to look for <ul/> & <li/> and translate into mdoc(5) <list type="bullet"/>, but that's significantly more work. Something we should do eventually, but not in this commit.)

@jonpryor
Copy link
Contributor

Overall, it may be better/simpler to separate the <a/> parsing changes from the <code/> parsing changes.

@pjcollins pjcollins marked this pull request as draft June 16, 2023 00:33
@pjcollins
Copy link
Member Author

I moved the <a/> element parsing fixes into #1126. I'll need to rethink the rest of these changes. This does produce a more readable doc in many cases, but as you've pointed out this is not going to be better / more desirable for all cases.

@pjcollins pjcollins closed this Jul 6, 2023
@pjcollins pjcollins deleted the dev/pjc/api33-doc-fix branch July 6, 2023 15:33
@pjcollins pjcollins restored the dev/pjc/api33-doc-fix branch July 11, 2023 20:29
@pjcollins pjcollins reopened this Jul 11, 2023
@pjcollins pjcollins changed the title [Java.Interop.Tools.JavaSource] Fix up common HTML tags [Java.Interop.Tools.JavaSource] Improve <code/> parsing Jul 11, 2023
@pjcollins pjcollins changed the title [Java.Interop.Tools.JavaSource] Improve <code/> parsing [Java.Interop.Tools.JavaSource] Improve <code> parsing Jul 11, 2023
@pjcollins
Copy link
Member Author

pjcollins commented Jul 11, 2023

It would be great to find a way to reduce the amount of encoded html entities in our docs to help with readability, but the attempt to do so as part of this PR was lacking. I've updated this to only include an update for <code> element parsing as suggested.

@pjcollins pjcollins marked this pull request as ready for review July 11, 2023 21:02
Assert.IsFalse (r.HasErrors (), DumpMessages (r, p));
Assert.AreEqual ("<c>android:label=\"@string/resolve_title\"</c>", r.Root.AstNode.ToString ());

r = p.Parse ("<code>Activity.RESULT_OK<code>");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to extend this test to see what happens with the content after the <code>, e.g. from android/telephony/gsm/SmsManager.java:

     *  The result code will be <code>Activity.RESULT_OK<code> for success,
     *  or one of these errors:
     *  <code>RESULT_ERROR_GENERIC_FAILURE</code>

or shorter: <code>should be code<code> but what about this <code>and this?</code>. Yes, <code>should be code<code> results in <c>should be code</c>, but what about the rest? Would we get:

<c>should be code</c><c> but what about this </c><c>and this?</c>

which is entirely defensible and reasonable, but is worth "calling out" in the unit tests.

Copy link
Member Author

@pjcollins pjcollins Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't appear to create an additional code element for the intermediate content, presumably because we capture the incorrectly placed open <code> tag while parsing the previous code element. I've added another test for this.

@jonpryor
Copy link
Contributor

WIP commit message:

Fixes: https://github.com/xamarin/java.interop/issues/1071

The latest API docs update contained a couple dozen parsing issues
due to `<code/>` parsing, including:

  * Closing element doesn't match opening element: `<code>null</null>`
  * Content including `@`: `<code>android:label="@string/resolve_title"</code>`
  * Closing element is actually an opening element:
    `<code>Activity.RESULT_OK<code>`
  * Improper element nesting: `<code><pre><p>content</code></pre></p>`
  * Use of attributes: `<code class=prettyprint>content<code>`

Fix this by replacing `CodeElementDeclaration` to use a new
`CodeElementContentTerm` terminal, which is a "greedy regex" which
grabs `<code` until one of:

  * `</code>`
  * `</null>`
  * `<code>`

@jonpryor jonpryor merged commit 151b03e into main Jul 12, 2023
@jonpryor jonpryor deleted the dev/pjc/api33-doc-fix branch July 12, 2023 18:47
jonpryor pushed a commit to dotnet/android that referenced this pull request Jul 14, 2023
Fixes: dotnet/java-interop#1071

Context: dotnet/java-interop@d0231c5

Changes: dotnet/java-interop@e1121ea...151b03e

  * dotnet/java-interop@151b03ee: [Java.Interop.Tools.JavaSource] Improve `<code>` parsing  (dotnet/java-interop#1125)
  * dotnet/java-interop@6a9f5cbb: [Java.Interop.Tools.JavaSource] Improve `<a>` parsing (dotnet/java-interop#1126)
  * dotnet/java-interop@d0231c5c: [generator] Override methods should match base deprecated info (dotnet/java-interop#1130)

Commit dotnet/java-interop@d0231c5c updated `override` methods'
"obsolete" data to match its base method.  This results in several
`[ObsoleteOSPlatform]` attributes being switched to `[Obsolete]`
attributes.  Updated `acceptable-breakages` to allow this.

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jonathan Pobst <[email protected]>
@github-actions github-actions bot locked and limited conversation to collaborators Apr 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix Javadoc parsing issues introduced with API 33

3 participants