-
Notifications
You must be signed in to change notification settings - Fork 64
[Tools.JavaSource] Support html tags with attributes #1286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
A recent attempt to import API docs for API 35 produced the following:
The following issues were found, review the build log for more details:
> ## Unable to translate remarks for android/app/admin/DevicePolicyManager:
> JavadocImport-Error (31:39): Syntax error, expected: </p>, </P>, #PCDATA, <tt>, <TT>, <i>, <I>, <a attr=, <code>, {@code, {@docroot}, {@inheritdoc}, {@link, {@linkplain, {@literal, {@see, {@value}, {@value, IgnorableDeclaration, {@param, UnknownHtmlElementStart, <p>, <P>, <pre , @author, @apiSince, @deprecated, @deprecatedSince, @exception, @inheritdoc, @hide, @param, @return, @see, @Serialdata, @serialField, @SInCE, @throws, @[unknown], @Version
<li>A <i id="deviceowner">Device Owner</i>, which only ever exists on the
^
Parsing logic fails here because the `<i>` tag has an `id` attribute
_and_ is present in an open `<p>` tag:
ParserTrace:
input=`<p> (Key symbol)`; error? False; message=Shift to S3
input=``; error? False; message=Reduce on '<p> -> <p> '
input=`<p>`; error? False; message=Popped state from stack, pushing <p>
input=`For (#PCDATA)`; error? False; message=Shift to S10
input=``; error? False; message=Reduce on '<html inline decl> -> #PCDATA '
input=`<html inline decl>`; error? False; message=Popped state from stack, pushing <html inline decl>
input=``; error? False; message=Reduce on 'InlineDeclaration -> <html inline decl> '
input=`InlineDeclaration`; error? False; message=Popped state from stack, pushing InlineDeclaration
input=``; error? False; message=Reduce on 'InlineDeclaration+ -> InlineDeclaration '
input=`InlineDeclaration+`; error? False; message=Popped state from stack, pushing InlineDeclaration+
input=`< (UnknownHtmlElementStart)`; error? False; message=Shift to S45
input=``; error? False; message=Reduce on '<html inline decl> -> UnknownHtmlElementStart '
input=`<html inline decl>`; error? False; message=Popped state from stack, pushing <html inline decl>
input=``; error? False; message=Reduce on 'InlineDeclaration -> <html inline decl> '
input=`InlineDeclaration`; error? False; message=Popped state from stack, pushing InlineDeclaration
input=``; error? False; message=Reduce on 'InlineDeclaration+ -> InlineDeclaration+ InlineDeclaration '
input=`InlineDeclaration+`; error? False; message=Popped state from stack, pushing InlineDeclaration+
input=`li>A (#PCDATA)`; error? False; message=Shift to S10
input=``; error? False; message=Reduce on '<html inline decl> -> #PCDATA '
input=`<html inline decl>`; error? False; message=Popped state from stack, pushing <html inline decl>
input=``; error? False; message=Reduce on 'InlineDeclaration -> <html inline decl> '
input=`InlineDeclaration`; error? False; message=Popped state from stack, pushing InlineDeclaration
input=``; error? False; message=Reduce on 'InlineDeclaration+ -> InlineDeclaration+ InlineDeclaration '
input=`InlineDeclaration+`; error? False; message=Popped state from stack, pushing InlineDeclaration+
input=`< (UnknownHtmlElementStart)`; error? False; message=Shift to S45
input=``; error? False; message=Reduce on '<html inline decl> -> UnknownHtmlElementStart '
input=`<html inline decl>`; error? False; message=Popped state from stack, pushing <html inline decl>
input=``; error? False; message=Reduce on 'InlineDeclaration -> <html inline decl> '
input=`InlineDeclaration`; error? False; message=Popped state from stack, pushing InlineDeclaration
input=``; error? False; message=Reduce on 'InlineDeclaration+ -> InlineDeclaration+ InlineDeclaration '
input=`InlineDeclaration+`; error? False; message=Popped state from stack, pushing InlineDeclaration+
input=`i id="deviceowner">Device Owner (#PCDATA)`; error? False; message=Shift to S10
input=``; error? False; message=Reduce on '<html inline decl> -> #PCDATA '
input=`<html inline decl>`; error? False; message=Popped state from stack, pushing <html inline decl>
input=``; error? False; message=Reduce on 'InlineDeclaration -> <html inline decl> '
input=`InlineDeclaration`; error? False; message=Popped state from stack, pushing InlineDeclaration
input=``; error? False; message=Reduce on 'InlineDeclaration+ -> InlineDeclaration+ InlineDeclaration '
input=`InlineDeclaration+`; error? False; message=Popped state from stack, pushing InlineDeclaration+
input=`</i> (Key symbol)`; error? False; message=Reduce on 'InlineDeclarations -> InlineDeclaration+ '
input=`InlineDeclarations`; error? False; message=Popped state from stack, pushing InlineDeclarations
input=`</i> (Key symbol)`; error? True; message=Syntax error, expected: </p>, </P>
input=`</i> (Key symbol)`; error? False; message=RECOVERING: popping stack, looking for state with error shift
input=`</i> (Key symbol)`; error? False; message=FAILED TO RECOVER
Updates the grammar to allow for `<i x=y>` html tags that include
attributes to fix this.
The regex in `CreateStartElementIgnoreAttribute` has also been improved
to include word boundaries around the tag name to make sure that it does
not match unexpected elements.
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| var start = CreateStartElement (htmlElement, grammar); | ||
| BnfTerm start = CreateStartElement (htmlElement, grammar); | ||
| if (ignoreAttributes) { | ||
| start = CreateStartElementIgnoreAttribute (htmlElement); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make more sense to just always ignore attributes, unless we otherwise know we need them? I believe HTML allows nearly everything to have; see e.g. §3.2.3 Global attributes:
The following attributes are common to and may be specified on all HTML elements (even those not defined in this specification):
- …
id
As such and in retrospect, CreateStartElement() as-is is a Bad Idea™. We should consider removing CreateStartElementIgnoreAttribute(), and update CreateStartElement() so that it always has an attribute section:
static RegexBasedTerminal CreateStartElement (string startElement, string attributesRegex = "")
{
return new RegexBasedTerminal ($"<{startElement} {attributesRegex}", $@"(?i)<{startElement}\s*{attribute}[^>]*>") {
AstConfig = new AstNodeConfig {
NodeCreator = (context, parseNode) => parseNode.AstNode = "",
},
};
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this plan, and after some local testing it looks like this is improving a handful of p -> para translations, for example:
- <para><p class="note">
- Since the introduction of JobScheduler, if an app did not return from
+ <para>Since the introduction of JobScheduler, if an app did not return from- application.
- </p>
- <p class="note">
- Note: if the requested package uses the <c>android:sharedUserId</c>
+ application.</para>
+ <para>Note: if the requested package uses the <c>android:sharedUserId</c>
manifest feature, this call will be forced into a slower manual
calculation path. If possible, consider always using
- <c>#queryStatsForUid(UUID, int)</c>, which is typically faster.
- </p></para>
+ <c>#queryStatsForUid(UUID, int)</c>, which is typically faster.</para><i> tags with attributes|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
The latest update here produced the following docs diff: |
Changes: dotnet/java-interop@2c06b3c...fe00cef * dotnet/java-interop@fe00cef5: [Java.Interop.Tools.JavaSource] Support html tags with attributes (dotnet/java-interop#1286) Bump to dotnet/java-interop@fe00cef5 to get Javadoc import fixes. Updates xaprepare and Mono.Android to generate API Docs against API-35 sources. The `azure-pipelines-apidocs.yaml` pipeline has been updated to use the 1ES pipeline template to improve compatibility with existing yaml templates. The semi-automated API docs build and update workflow uses the .NET Framework version of Mdoc and the `RunMdoc` target has been updated to continue to mirror that workflow.
) Context: dotnet/android#9647 dotnet/android#9647 attempted to import API docs for API 35, and produced the following warning: The following issues were found, review the build log for more details: > ## Unable to translate remarks for android/app/admin/DevicePolicyManager: > JavadocImport-Error (31:39): Syntax error, expected: </p>, </P>, #PCDATA, <tt>, <TT>, <i>, <I>, <a attr=, <code>, {@code, {@docroot}, {@inheritdoc}, {@link, {@linkplain, {@literal, {@see, {@value}, {@value, IgnorableDeclaration, {@param, UnknownHtmlElementStart, <p>, <P>, <pre , @author, @apiSince, @deprecated, @deprecatedSince, @exception, @inheritdoc, @hide, @param, @return, @see, @Serialdata, @serialField, @SInCE, @throws, @[unknown], @Version <li>A <i id="deviceowner">Device Owner</i>, which only ever exists on the ^ Parsing logic fails here because the `<i>` tag has an `id` attribute _and_ is present in an open `<p>` tag. Turns Out™ that HTML allows attributes on nearly *everything*; e.g. from [§3.2.3 Global attributes][0]: > The following attributes are common to and may be specified on all > [HTML elements](https://dev.w3.org/html5/spec-LC/infrastructure.html#html-elements) > (even those not defined in this specification): > * … > * `id` Given this, it doesn't make sense for `CreateStartElement()` to not allow any attributes. Update `CreateStartElement()` so that *all* elements *ignore* any specified attributes (by default), which allows `<i id="deviceowner">Device Owner</i>` to work. The regex used has also been improved to include word boundaries around the tag name to make sure that it does not match unexpected elements. [0]: https://dev.w3.org/html5/spec-LC/elements.html#global-attributes
A recent attempt to import API docs for API 35 produced the following:
Parsing logic fails here because the
<i>tag has anidattribute and is present in an open<p>tag:Updates the grammar to ignore attributes on html tags such as
<x y=z>to fix this.The regex in
CreateStartElementhas also been improved to include word boundaries around the tag name to make sure that it does not match unexpected elements.