-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[GR-49402] Correct mangling of primitive types in BFD name mangler #7423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
FWIW, this is documented in https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling-compression |
zakkak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I added a couple of comments/questions that might help improve the PR.
...vm/src/com.oracle.svm.hosted/src/com/oracle/svm/hosted/image/NativeImageBFDNameProvider.java
Outdated
Show resolved
Hide resolved
...vm/src/com.oracle.svm.hosted/src/com/oracle/svm/hosted/image/NativeImageBFDNameProvider.java
Outdated
Show resolved
Hide resolved
...vm/src/com.oracle.svm.hosted/src/com/oracle/svm/hosted/image/NativeImageBFDNameProvider.java
Outdated
Show resolved
Hide resolved
...vm/src/com.oracle.svm.hosted/src/com/oracle/svm/hosted/image/NativeImageBFDNameProvider.java
Outdated
Show resolved
Hide resolved
...vm/src/com.oracle.svm.hosted/src/com/oracle/svm/hosted/image/NativeImageBFDNameProvider.java
Outdated
Show resolved
Hide resolved
...vm/src/com.oracle.svm.hosted/src/com/oracle/svm/hosted/image/NativeImageBFDNameProvider.java
Outdated
Show resolved
Hide resolved
...vm/src/com.oracle.svm.hosted/src/com/oracle/svm/hosted/image/NativeImageBFDNameProvider.java
Show resolved
Hide resolved
...vm/src/com.oracle.svm.hosted/src/com/oracle/svm/hosted/image/NativeImageBFDNameProvider.java
Outdated
Show resolved
Hide resolved
...vm/src/com.oracle.svm.hosted/src/com/oracle/svm/hosted/image/NativeImageBFDNameProvider.java
Show resolved
Hide resolved
...vm/src/com.oracle.svm.hosted/src/com/oracle/svm/hosted/image/NativeImageBFDNameProvider.java
Outdated
Show resolved
Hide resolved
...vm/src/com.oracle.svm.hosted/src/com/oracle/svm/hosted/image/NativeImageBFDNameProvider.java
Outdated
Show resolved
Hide resolved
...vm/src/com.oracle.svm.hosted/src/com/oracle/svm/hosted/image/NativeImageBFDNameProvider.java
Show resolved
Hide resolved
...vm/src/com.oracle.svm.hosted/src/com/oracle/svm/hosted/image/NativeImageBFDNameProvider.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| private boolean substituteName(LookupName name) { | ||
| int index = bindings.indexOf(name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linear search ... are we sure that this does not (ever) explode (computationally)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lookup cost here will be o(N^2) where N is the number of substitutable elements in the name and signature. N accumulates at most
1 for the package (namespace) prefix
2 for an object return type
2 for an object or (any) array type argument
1 if we have a boolean argument
1 if we have a char argument
1 if we have a byte argument
The count for an object argument/return type can be lowered if
the object type is a foreign type (not passed as a pointer) - subtract 1
the object type repeats the namespace prefix (e.g. java.lang.String) - subtract 1
the object type repeats a previous occurrence - subtract 2
I added a temporary tweak to the patch to count the bindings length for each symbol encoded and tested it by building the image used for the debuginfo test. The worst case I found was this one:
java.util.Collection*
sun.security.provider.certpath.DistributionPointFetcher::getCRLs(java.security.cert.X509CRLSelector*,
sun.security.x509.X509CertImpl*, sun.security.x509.DistributionPoint*, boolean[]*,
boolean, java.security.PublicKey*, java.security.cert.X509Certificate*, java.lang.String*,
java.util.List*, java.util.Set*, java.util.Date*, java.lang.String*, java.security.cert.TrustAnchor*)
Encoding it generates 26 bindings. It has a lot of args and the only repeat is the second occurrence of java.lang.String* which appears in the encoding as SG_ i.e. binding 18.
That case is very much an outlier. Normally the number of bindings is quite small. Here is a table summarizing the distribution for the 65667 different encodings generated during generation of one of the test images:
| Bindings Count | Frequency | %ge |
|---|---|---|
| 1 | 12721 | 19.4 |
| 2 | 7482 | 11.4 |
| 3 | 24745 | 37.7 |
| 4 | 6049 | 9.2 |
| 5 | 8164 | 12.4 |
| 6 | 2509 | 3.8 |
| 7 | 2578 | 3.9 |
| 8 | 358 | 5.4 |
| 9 | 665 | 1.0 |
| 10 | 108 | 0.1 |
| 11-15 | 260 | 0.39 |
| 16-20 | 24 | ~0.0 |
| 20-25 | 3 | ~0.0 |
| 26 | 1 | ~0.0 |
I don't think this means we must switch to a constant time lookup model such as an EconomicMap but the trade off is not clear. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed breakdown.
Of course one can think of pathological methods (potentially from generated source code or test suites) that could be much worse than what you found. The question is if it's practically relevant. I would suggest you build a large test image (huge quarkus or springboot native app ... @zakkak might be able to help with that) and check if you can see any real world build time regressions that are distinguishable from noise when building with -g.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could even use the "temporary tweak to the patch to count the bindings length" to report some more stats, other than the build times. @adinn would you mind sharing that patch, to make sure we are reporting the same thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zakkak Not much to share really. The temporary tweak was simply a call to System.out.format() at the end of BFDMangle::mangle(String, ResolvedJavaType, String, Signature, boolean), printing the resulting mangled name and the length of bindings. The rest of the work was done using shell commands on the resulting output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adinn this is a kind reminder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it looks like I was looking at a cached version of this when I pinged you :/
Thanks for the reply @adinn , I will try to get some number (hopefully this week)
I looked at @adinn's examples and the documentation and couldn't make sense why
So the fact that is a parameter makes an additional symbol FWIU. |
6c11c56 to
7d93199
Compare
| * resulting bindings would be S_ ==> Hello, S0_ ==> Hello::compareTo, S1_ ==> Hello and S2_ | ||
| * ==> Hello* i.e. both S_ and S1_ would demangle to Hello. This situation should never | ||
| * arise. A name can always be correctly encoded without repeats. In the above example that | ||
| * woudl be _ZN5Hello9compareToEJiPS_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for re-reviewing, Paul. I fixed this and also squashed the style fixes into the main patch and review rework patches.
8e80df1 to
5f6c589
Compare
olpaw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM now. Only cosmetic suggestions.
Ready to merge if @zakkak reports that this does not cause performance regression.
...vm/src/com.oracle.svm.hosted/src/com/oracle/svm/hosted/image/NativeImageBFDNameProvider.java
Show resolved
Hide resolved
...vm/src/com.oracle.svm.hosted/src/com/oracle/svm/hosted/image/NativeImageBFDNameProvider.java
Outdated
Show resolved
Hide resolved
...vm/src/com.oracle.svm.hosted/src/com/oracle/svm/hosted/image/NativeImageBFDNameProvider.java
Outdated
Show resolved
Hide resolved
...vm/src/com.oracle.svm.hosted/src/com/oracle/svm/hosted/image/NativeImageBFDNameProvider.java
Outdated
Show resolved
Hide resolved
|
@olpaw Remaining recommendations have been followed. Hopefully this will pass the gate and be ready for inclusion. |
|
Running in internal CI ... ⌛ |
This PR fixes issue #7421 (q.v.)
The issue is susceptible to a simple fix and a more sophisticated fix as outlined below. The current patch implements the more sophisticated patch in order to minimize symbol footprint
Basic fix
The simplest solution to the problem is to encode primitive type name using embedded symbol names. So, given the two examples cited in the issue
the encodings would be
which successfully decode as desired
Note that the names
4byte,7booleanand4charrefer to typedefs for primitive types defined in the debug info section so analysis tools which process these names will still be able to resolve them to appropriate machine layouts.Better fix
The problem with the basic fix is that it significantly enlarges symbol names when signatures include repeat occurrences of the same primitive type:
A better solution profits from the use of short names to refer to elements that have already been encoded.
In the above case the encoding of the class name(space) as part of the method name introduces the short name
S_forjdk.internal.misc.Unsafe. Likewise, encoding of the first parameter type introduces short nameS0_for type namebyte. The latter binding meant that repeated occurrences of4bytecan be replaced withS0_.Implementing short name substitution only provides a small gain in footprint when encoding primitive symbol names (2 bytes per
charorbyteparameter, 5 perbooleanparameter). However, it can shorten symbols much more significantly when object type names are repeated. For example without short name substitution the following encoding arises for methodArrays::deepEquals:The repeated
Object[]*argument can be encoded using a short name as follows:providing a much greater saving in footprint (18 bytes).