Skip to content

Conversation

@pjcollins
Copy link
Member

@pjcollins pjcollins commented Jan 2, 2025

A recent attempt to import API docs for API 35 produced the following:

The following issues were found, review the build log for more details:
>   ## Unable to translate remarks for android/app/admin/DevicePolicyManager:
>   JavadocImport-Error (31:39): Syntax error, expected: </p>, </P>, #PCDATA, <tt>, <TT>, <i>, <I>, <a attr=, <code>, {@code, {@docRoot}, {@inheritDoc}, {@link, {@linkplain, {@literal, {@see, {@value}, {@value, IgnorableDeclaration, {@param, UnknownHtmlElementStart, <p>, <P>, <pre , @author, @apiSince, @deprecated, @deprecatedSince, @exception, @inheritDoc, @hide, @param, @return, @see, @serialData, @serialField, @since, @throws, @[unknown], @version
    <li>A <i id="deviceowner">Device Owner</i>, which only ever exists on the
                                          ^

Parsing logic fails here because the <i> tag has an id attribute and is present in an open <p> tag:

ParserTrace:
  input=`<p> (Key symbol)`; error? False; message=Shift to S3
  input=``; error? False; message=Reduce on '<p> -> <p> '
  input=`<p>`; error? False; message=Popped state from stack, pushing <p>
  input=`For  (#PCDATA)`; error? False; message=Shift to S10
  input=``; error? False; message=Reduce on '<html inline decl> -> #PCDATA '
  input=`<html inline decl>`; error? False; message=Popped state from stack, pushing <html inline decl>
  input=``; error? False; message=Reduce on 'InlineDeclaration -> <html inline decl> '
  input=`InlineDeclaration`; error? False; message=Popped state from stack, pushing InlineDeclaration
  input=``; error? False; message=Reduce on 'InlineDeclaration+ -> InlineDeclaration '
  input=`InlineDeclaration+`; error? False; message=Popped state from stack, pushing InlineDeclaration+
  input=`< (UnknownHtmlElementStart)`; error? False; message=Shift to S45
  input=``; error? False; message=Reduce on '<html inline decl> -> UnknownHtmlElementStart '
  input=`<html inline decl>`; error? False; message=Popped state from stack, pushing <html inline decl>
  input=``; error? False; message=Reduce on 'InlineDeclaration -> <html inline decl> '
  input=`InlineDeclaration`; error? False; message=Popped state from stack, pushing InlineDeclaration
  input=``; error? False; message=Reduce on 'InlineDeclaration+ -> InlineDeclaration+ InlineDeclaration '
  input=`InlineDeclaration+`; error? False; message=Popped state from stack, pushing InlineDeclaration+
  input=`li>A  (#PCDATA)`; error? False; message=Shift to S10
  input=``; error? False; message=Reduce on '<html inline decl> -> #PCDATA '
  input=`<html inline decl>`; error? False; message=Popped state from stack, pushing <html inline decl>
  input=``; error? False; message=Reduce on 'InlineDeclaration -> <html inline decl> '
  input=`InlineDeclaration`; error? False; message=Popped state from stack, pushing InlineDeclaration
  input=``; error? False; message=Reduce on 'InlineDeclaration+ -> InlineDeclaration+ InlineDeclaration '
  input=`InlineDeclaration+`; error? False; message=Popped state from stack, pushing InlineDeclaration+
  input=`< (UnknownHtmlElementStart)`; error? False; message=Shift to S45
  input=``; error? False; message=Reduce on '<html inline decl> -> UnknownHtmlElementStart '
  input=`<html inline decl>`; error? False; message=Popped state from stack, pushing <html inline decl>
  input=``; error? False; message=Reduce on 'InlineDeclaration -> <html inline decl> '
  input=`InlineDeclaration`; error? False; message=Popped state from stack, pushing InlineDeclaration
  input=``; error? False; message=Reduce on 'InlineDeclaration+ -> InlineDeclaration+ InlineDeclaration '
  input=`InlineDeclaration+`; error? False; message=Popped state from stack, pushing InlineDeclaration+
  input=`i id="deviceowner">Device Owner (#PCDATA)`; error? False; message=Shift to S10
  input=``; error? False; message=Reduce on '<html inline decl> -> #PCDATA '
  input=`<html inline decl>`; error? False; message=Popped state from stack, pushing <html inline decl>
  input=``; error? False; message=Reduce on 'InlineDeclaration -> <html inline decl> '
  input=`InlineDeclaration`; error? False; message=Popped state from stack, pushing InlineDeclaration
  input=``; error? False; message=Reduce on 'InlineDeclaration+ -> InlineDeclaration+ InlineDeclaration '
  input=`InlineDeclaration+`; error? False; message=Popped state from stack, pushing InlineDeclaration+
  input=`</i> (Key symbol)`; error? False; message=Reduce on 'InlineDeclarations -> InlineDeclaration+ '
  input=`InlineDeclarations`; error? False; message=Popped state from stack, pushing InlineDeclarations
  input=`</i> (Key symbol)`; error? True; message=Syntax error, expected: </p>, </P>
  input=`</i> (Key symbol)`; error? False; message=RECOVERING: popping stack, looking for state with error shift
  input=`</i> (Key symbol)`; error? False; message=FAILED TO RECOVER

Updates the grammar to ignore attributes on html tags such as <x y=z> to fix this.

The regex in CreateStartElement has also been improved to include word boundaries around the tag name to make sure that it does not match unexpected elements.

A recent attempt to import API docs for API 35 produced the following:

    The following issues were found, review the build log for more details:
    >   ## Unable to translate remarks for android/app/admin/DevicePolicyManager:
    >   JavadocImport-Error (31:39): Syntax error, expected: </p>, </P>, #PCDATA, <tt>, <TT>, <i>, <I>, <a attr=, <code>, {@code, {@docroot}, {@inheritdoc}, {@link, {@linkplain, {@literal, {@see, {@value}, {@value, IgnorableDeclaration, {@param, UnknownHtmlElementStart, <p>, <P>, <pre , @author, @apiSince, @deprecated, @deprecatedSince, @exception, @inheritdoc, @hide, @param, @return, @see, @Serialdata, @serialField, @SInCE, @throws, @[unknown], @Version
        <li>A <i id="deviceowner">Device Owner</i>, which only ever exists on the
                                              ^

Parsing logic fails here because the `<i>` tag has an `id` attribute
_and_ is present in an open `<p>` tag:

    ParserTrace:
      input=`<p> (Key symbol)`; error? False; message=Shift to S3
      input=``; error? False; message=Reduce on '<p> -> <p> '
      input=`<p>`; error? False; message=Popped state from stack, pushing <p>
      input=`For  (#PCDATA)`; error? False; message=Shift to S10
      input=``; error? False; message=Reduce on '<html inline decl> -> #PCDATA '
      input=`<html inline decl>`; error? False; message=Popped state from stack, pushing <html inline decl>
      input=``; error? False; message=Reduce on 'InlineDeclaration -> <html inline decl> '
      input=`InlineDeclaration`; error? False; message=Popped state from stack, pushing InlineDeclaration
      input=``; error? False; message=Reduce on 'InlineDeclaration+ -> InlineDeclaration '
      input=`InlineDeclaration+`; error? False; message=Popped state from stack, pushing InlineDeclaration+
      input=`< (UnknownHtmlElementStart)`; error? False; message=Shift to S45
      input=``; error? False; message=Reduce on '<html inline decl> -> UnknownHtmlElementStart '
      input=`<html inline decl>`; error? False; message=Popped state from stack, pushing <html inline decl>
      input=``; error? False; message=Reduce on 'InlineDeclaration -> <html inline decl> '
      input=`InlineDeclaration`; error? False; message=Popped state from stack, pushing InlineDeclaration
      input=``; error? False; message=Reduce on 'InlineDeclaration+ -> InlineDeclaration+ InlineDeclaration '
      input=`InlineDeclaration+`; error? False; message=Popped state from stack, pushing InlineDeclaration+
      input=`li>A  (#PCDATA)`; error? False; message=Shift to S10
      input=``; error? False; message=Reduce on '<html inline decl> -> #PCDATA '
      input=`<html inline decl>`; error? False; message=Popped state from stack, pushing <html inline decl>
      input=``; error? False; message=Reduce on 'InlineDeclaration -> <html inline decl> '
      input=`InlineDeclaration`; error? False; message=Popped state from stack, pushing InlineDeclaration
      input=``; error? False; message=Reduce on 'InlineDeclaration+ -> InlineDeclaration+ InlineDeclaration '
      input=`InlineDeclaration+`; error? False; message=Popped state from stack, pushing InlineDeclaration+
      input=`< (UnknownHtmlElementStart)`; error? False; message=Shift to S45
      input=``; error? False; message=Reduce on '<html inline decl> -> UnknownHtmlElementStart '
      input=`<html inline decl>`; error? False; message=Popped state from stack, pushing <html inline decl>
      input=``; error? False; message=Reduce on 'InlineDeclaration -> <html inline decl> '
      input=`InlineDeclaration`; error? False; message=Popped state from stack, pushing InlineDeclaration
      input=``; error? False; message=Reduce on 'InlineDeclaration+ -> InlineDeclaration+ InlineDeclaration '
      input=`InlineDeclaration+`; error? False; message=Popped state from stack, pushing InlineDeclaration+
      input=`i id="deviceowner">Device Owner (#PCDATA)`; error? False; message=Shift to S10
      input=``; error? False; message=Reduce on '<html inline decl> -> #PCDATA '
      input=`<html inline decl>`; error? False; message=Popped state from stack, pushing <html inline decl>
      input=``; error? False; message=Reduce on 'InlineDeclaration -> <html inline decl> '
      input=`InlineDeclaration`; error? False; message=Popped state from stack, pushing InlineDeclaration
      input=``; error? False; message=Reduce on 'InlineDeclaration+ -> InlineDeclaration+ InlineDeclaration '
      input=`InlineDeclaration+`; error? False; message=Popped state from stack, pushing InlineDeclaration+
      input=`</i> (Key symbol)`; error? False; message=Reduce on 'InlineDeclarations -> InlineDeclaration+ '
      input=`InlineDeclarations`; error? False; message=Popped state from stack, pushing InlineDeclarations
      input=`</i> (Key symbol)`; error? True; message=Syntax error, expected: </p>, </P>
      input=`</i> (Key symbol)`; error? False; message=RECOVERING: popping stack, looking for state with error shift
      input=`</i> (Key symbol)`; error? False; message=FAILED TO RECOVER

Updates the grammar to allow for `<i x=y>` html tags that include
attributes to fix this.

The regex in `CreateStartElementIgnoreAttribute` has also been improved
to include word boundaries around the tag name to make sure that it does
not match unexpected elements.
@pjcollins
Copy link
Member Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@pjcollins pjcollins marked this pull request as ready for review January 6, 2025 16:15
@pjcollins pjcollins requested a review from Copilot January 6, 2025 16:15
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

@pjcollins pjcollins requested a review from jonpryor January 6, 2025 16:44
var start = CreateStartElement (htmlElement, grammar);
BnfTerm start = CreateStartElement (htmlElement, grammar);
if (ignoreAttributes) {
start = CreateStartElementIgnoreAttribute (htmlElement);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to just always ignore attributes, unless we otherwise know we need them? I believe HTML allows nearly everything to have; see e.g. §3.2.3 Global attributes:

The following attributes are common to and may be specified on all HTML elements (even those not defined in this specification):

  • id

As such and in retrospect, CreateStartElement() as-is is a Bad Idea™. We should consider removing CreateStartElementIgnoreAttribute(), and update CreateStartElement() so that it always has an attribute section:

static RegexBasedTerminal CreateStartElement (string startElement, string attributesRegex = "")
{
		return new RegexBasedTerminal ($"<{startElement} {attributesRegex}", $@"(?i)<{startElement}\s*{attribute}[^>]*>") {
			AstConfig = new AstNodeConfig {
				NodeCreator = (context, parseNode) => parseNode.AstNode = "",
			},
		};
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this plan, and after some local testing it looks like this is improving a handful of p -> para translations, for example:

-      <para>&lt;p class="note"&gt;
-            Since the introduction of JobScheduler, if an app did not return from
+      <para>Since the introduction of JobScheduler, if an app did not return from
-            application.
-            &lt;/p&gt;
-            &lt;p class="note"&gt;
-            Note: if the requested package uses the <c>android:sharedUserId</c>
+            application.</para>
+          <para>Note: if the requested package uses the <c>android:sharedUserId</c>
             manifest feature, this call will be forced into a slower manual
             calculation path. If possible, consider always using
-            <c>#queryStatsForUid(UUID, int)</c>, which is typically faster.
-            &lt;/p&gt;</para>
+            <c>#queryStatsForUid(UUID, int)</c>, which is typically faster.</para>

@pjcollins pjcollins changed the title [Tools.JavaSource] Support <i> tags with attributes [Tools.JavaSource] Support html tags with attributes Jan 6, 2025
@jonpryor
Copy link
Contributor

jonpryor commented Jan 6, 2025

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@pjcollins pjcollins requested a review from jonpryor January 6, 2025 22:24
@pjcollins
Copy link
Member Author

The latest update here produced the following docs diff:
dotnet/android-api-docs@48be17b

@jonpryor jonpryor merged commit fe00cef into main Jan 7, 2025
4 checks passed
@jonpryor jonpryor deleted the dev/pjc/javadoc-i-attrib branch January 7, 2025 20:56
jonpryor pushed a commit to dotnet/android that referenced this pull request Jan 8, 2025
Changes: dotnet/java-interop@2c06b3c...fe00cef

  * dotnet/java-interop@fe00cef5: [Java.Interop.Tools.JavaSource] Support html tags with attributes (dotnet/java-interop#1286)

Bump to dotnet/java-interop@fe00cef5 to get Javadoc import fixes.

Updates xaprepare and Mono.Android to generate API Docs against
API-35 sources.

The `azure-pipelines-apidocs.yaml` pipeline has been updated to use
the 1ES pipeline template to improve compatibility with existing
yaml templates.

The semi-automated API docs build and update workflow uses the
.NET Framework version of Mdoc and the `RunMdoc` target has been
updated to continue to mirror that workflow.
pjcollins added a commit that referenced this pull request Jan 8, 2025
)

Context: dotnet/android#9647

dotnet/android#9647 attempted to import API docs for API 35, and
produced the following warning:

	The following issues were found, review the build log for more details:
	>   ## Unable to translate remarks for android/app/admin/DevicePolicyManager:
	>   JavadocImport-Error (31:39): Syntax error, expected: </p>, </P>, #PCDATA, <tt>, <TT>, <i>, <I>, <a attr=, <code>, {@code, {@docroot}, {@inheritdoc}, {@link, {@linkplain, {@literal, {@see, {@value}, {@value, IgnorableDeclaration, {@param, UnknownHtmlElementStart, <p>, <P>, <pre , @author, @apiSince, @deprecated, @deprecatedSince, @exception, @inheritdoc, @hide, @param, @return, @see, @Serialdata, @serialField, @SInCE, @throws, @[unknown], @Version
	    <li>A <i id="deviceowner">Device Owner</i>, which only ever exists on the
	                                          ^

Parsing logic fails here because the `<i>` tag has an `id` attribute
_and_ is present in an open `<p>` tag.

Turns Out™ that HTML allows attributes on nearly *everything*; e.g.
from [§3.2.3 Global attributes][0]: 

> The following attributes are common to and may be specified on all
> [HTML elements](https://dev.w3.org/html5/spec-LC/infrastructure.html#html-elements)
> (even those not defined in this specification):
>   * …
>   * `id`

Given this, it doesn't make sense for `CreateStartElement()` to not
allow any attributes.  Update `CreateStartElement()` so that *all*
elements *ignore* any specified attributes (by default), which
allows `<i id="deviceowner">Device Owner</i>` to work.

The regex used has also been improved to include word boundaries
around the tag name to make sure that it does not match unexpected
elements.

[0]: https://dev.w3.org/html5/spec-LC/elements.html#global-attributes
@github-actions github-actions bot locked and limited conversation to collaborators Feb 7, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants