Skip to content

Conversation

@wenshao
Copy link
Contributor

@wenshao wenshao commented Oct 2, 2023

I submitted PR #15555 before, and there were too many changes. I split it into multiple PRs with small changes. This one is one of them.

this PR removed the duplicate code for getChars in BigDecimal#StringBuilderHelper, i also make performance faster.
Please review and don't hesitate to critique my approach and patch.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 2 Reviewers)

Issue

  • JDK-8315585: Optimization for decimal to string (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16006/head:pull/16006
$ git checkout pull/16006

Update a local copy of the PR:
$ git checkout pull/16006
$ git pull https://git.openjdk.org/jdk.git pull/16006/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 16006

View PR using the GUI difftool:
$ git pr show -t 16006

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16006.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 2, 2023

👋 Welcome back wenshao! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@wenshao wenshao changed the title Optimization for BigDecimal toString Optimization for decimal to string Oct 2, 2023
@openjdk
Copy link

openjdk bot commented Oct 2, 2023

@wenshao The following labels will be automatically applied to this pull request:

  • core-libs
  • security

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@wenshao
Copy link
Contributor Author

wenshao commented Oct 2, 2023

/label remove security

@openjdk
Copy link

openjdk bot commented Oct 2, 2023

@wenshao
The security label was successfully removed.

@wenshao
Copy link
Contributor Author

wenshao commented Oct 2, 2023

@jddarcy, could you please review my code? Thank you!

@wenshao wenshao changed the title Optimization for decimal to string 8315585: Optimization for decimal to string Oct 4, 2023
@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 4, 2023
@wenshao
Copy link
Contributor Author

wenshao commented Oct 4, 2023

@cl4es could you please review my code? Thank you!

@mlbridge
Copy link

mlbridge bot commented Oct 4, 2023

@wenshao
Copy link
Contributor Author

wenshao commented Oct 9, 2023

Based on @liach 's review suggestions, I reduced the changes to JavaLangAccess and further removed the duplicate code in BigDecimal.

Copy link
Member

@cl4es cl4es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really qualified to review the floating point code. Simplifying away offset and getting rid of the StringBuilderHelper all seem like good improvements, though I think it'd be good if we could either avoid or split out the JLA changes.

return (Integer.toString(highInt) + '.' +
StringBuilderHelper.DIGIT_TENS[lowInt] +
StringBuilderHelper.DIGIT_ONES[lowInt]) ;
int highIntSize = JLA.stringSize(highInt);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which micros cover performance of this branch?

What performance would you get if you simplified this to, say, return highInt + (lowInt < 10 ? ".0" : ".") + lowInt;?

lowInt is currently unused since you use (int) intCompact - highInt * 100 instead below. (While a clever optimization in theory, I believe the JIT should handle a pair of integer modulo and division operations (a = intValue % 100; b = intValue / 100) so that it only has to do one division, so please measure that whatever you do here has a significant benefit).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance numbers run on MacBook M1 Max:

-Benchmark                                 Mode  Cnt     Score    Error  Units
-BigDecimals.testHugeToEngineeringString   avgt   15   217.619 ?  4.617  ns/op
-BigDecimals.testLargeToEngineeringString  avgt   15    65.183 ?  4.368  ns/op
-BigDecimals.testSmallToEngineeringString  avgt   15    17.084 ?  0.056  ns/op
-BigDecimals.testToEngineeringString       avgt   15  1856.849 ? 63.571  ns/op

+Benchmark                                 Mode  Cnt     Score    Error  Units (eec631c)
+BigDecimals.testHugeToEngineeringString   avgt   15   164.910 ?  1.514  ns/op (+31.97)
+BigDecimals.testLargeToEngineeringString  avgt   15    20.826 ?  0.066  ns/op (+212.99)
+BigDecimals.testSmallToEngineeringString  avgt   15    11.987 ?  0.080  ns/op (+42.53)
+BigDecimals.testToEngineeringString       avgt   15  1854.372 ? 63.810  ns/op (+0.14)

The microben covered by fast-path is BigDecimals.testSmallToEngineeringString, highInt + (lowInt < 10 ? ".0" : ".") + lowInt is actually implemented based on StringBuilder and will be much slower than the current baseline.

Benchmark                                 Mode  Cnt   Score   Error  Units (use StringConcat)
BigDecimals.testSmallToEngineeringString  avgt   15  41.876 ? 0.182  ns/op

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it again and found that the remainder is not slow. This was my wrong impression. I have changed it according to your suggestion.

@wenshao
Copy link
Contributor Author

wenshao commented Oct 11, 2023

I'm not really qualified to review the floating point code. Simplifying away offset and getting rid of the StringBuilderHelper all seem like good improvements, though I think it'd be good if we could either avoid or split out the JLA changes.

Keep the duplicate code of StringConcatHelper, or use JLA, or move the code of getCharsLatin1 & stringSize to DecimalDigits (PR #15699). Of the three options, using JLA is the smallest change.

@cl4es
Copy link
Member

cl4es commented Oct 12, 2023

Keep the duplicate code of StringConcatHelper, or use JLA, or move the code of getCharsLatin1 & stringSize to DecimalDigits (PR #15699). Of the three options, using JLA is the smallest change.

Adding methods to JLA also adds a maintenance burden by exposing methods and internal details that we prefer not to expose.

Explicit String concat performs poorly mainly since ISC is disabled for the java.base module. We could, however, explicitly invoke the StringConcatFactory and pull out the resulting MethodHandle:

diff --git a/src/java.base/share/classes/java/math/BigDecimal.java b/src/java.base/share/classes/java/math/BigDecimal.java
index 3163b2ffadb..b4edbb131d9 100644
--- a/src/java.base/share/classes/java/math/BigDecimal.java
+++ b/src/java.base/share/classes/java/math/BigDecimal.java
@@ -30,6 +30,12 @@
 package java.math;
 
 import static java.math.BigInteger.LONG_MASK;
+
+import java.lang.invoke.CallSite;
+import java.lang.invoke.MethodHandle;
+import java.lang.invoke.MethodHandles;
+import java.lang.invoke.MethodType;
+import java.lang.invoke.StringConcatFactory;
 import java.nio.charset.CharacterCodingException;
 import java.nio.charset.StandardCharsets;
 import java.io.IOException;
@@ -4146,6 +4152,23 @@ public BigDecimal ulp() {
         return BigDecimal.valueOf(1, this.scale(), 1);
     }
 
+    private static class ConcatHelper {
+        private static final MethodHandle INT_DOT_CHAR_CHAR;
+
+        static {
+            try {
+                CallSite site = StringConcatFactory.makeConcatWithConstants(
+                        MethodHandles.lookup(),
+                        "scale2",
+                        MethodType.methodType(String.class, int.class, char.class, char.class),
+                        "\u0001.\u0001\u0001");
+                INT_DOT_CHAR_CHAR = site.dynamicInvoker();
+            } catch (Exception e) {
+                throw new Error("Bootstrap error", e);
+            }
+        }
+    }
+
     /**
      * Lay out this {@code BigDecimal} into a {@code char[]} array.
      * The Java 1.2 equivalent to this was called {@code getValueString}.
@@ -4164,18 +4187,12 @@ private String layoutChars(boolean sci) {
             intCompact >= 0 && intCompact < Integer.MAX_VALUE) {
             // currency fast path
             int highInt = (int)intCompact / 100;
-            int highIntSize = JLA.stringSize(highInt);
-            byte[] buf = new byte[highIntSize + 3];
-            JLA.getCharsLatin1(highInt, highIntSize, buf);
-            buf[highIntSize] = '.';
-            DecimalDigits.writeDigitPairLatin1(
-                    buf,
-                    highIntSize + 1,
-                    (int) intCompact % 100);
+            int lowInt = (int)intCompact - highInt * 100;
+            char pair = (char)DecimalDigits.digitPair(lowInt);
             try {
-                return JLA.newStringNoRepl(buf, StandardCharsets.ISO_8859_1);
-            } catch (CharacterCodingException e) {
-                throw new AssertionError(e);
+                return (String)ConcatHelper.INT_DOT_CHAR_CHAR.invokeExact(highInt, (char)(pair & 0xff), (char)(pair >> 8));
+            } catch (Throwable e) {
+                throw new RuntimeException(e);
             }
         }

This is equivalent to return "" + highInt + '.' + (char)(pair & 0xff) + (char)(pair >> 8); (if BigDecimal.java would have been compiled without -XDstringConcat=inline) and produces code that is as fast or even slightly faster in my tests:

Name                                     Cnt     Base   Error      Test    Error   Unit  Change
BigDecimals.testSmallToEngineeringString  15   11,703 ± 0,017    10,667 ±  0,058  ns/op   1,10x (p = 0,000*)
  :gc.alloc.rate                             4530,180 ± 6,688  4970,132 ± 26,973 MB/sec   1,10x (p = 0,000*)
  :gc.alloc.rate.norm                          55,600 ± 0,000    55,600 ±  0,000   B/op   1,00x (p = 0,000 )
  :gc.count                                   228,000           226,000          counts
  :gc.time                                    131,000           130,000              ms
  * = significant

@cl4es
Copy link
Member

cl4es commented Oct 12, 2023

Yes, explicitly calling into StringConcatFactory and invoking MHs is cumbersome. Just putting it out there as an alternative that works based on current public API.

It might also be possible to evolve the decision to compile all of java.base with -XDstringConcat=inline to instead only do that for a subset of the most core classes. We could then just do return "" + highInt + '.' + (char)(pair & 0xff) + (char)(pair >> 8); and get the same performance.

@cl4es
Copy link
Member

cl4es commented Oct 13, 2023

/reviewers 2 reviewer

@openjdk
Copy link

openjdk bot commented Oct 13, 2023

@cl4es
The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with at least 2 Reviewers).

Copy link
Member

@cl4es cl4es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for simplifying. It seems that even regular StringBuilder-based concat gives numbers that are competitive with the baseline this way. I'll file an RFE to examine if we can enable ISC for this.

I've requested another reviewer for this, since I don't have the sufficient floating point domain expertise to review these changes alone.

@cl4es
Copy link
Member

cl4es commented Oct 19, 2023

In addition to #16244, will you submit a PR for this?

Once both #16244 and this has been integrated I want to revisit this. The effect of changing getBytes(byte[], int, byte) might have disappeared with #16244 since it better guarantees the JIT will constant fold the prefixes thoroughly. We've observed related issues with System.arraycopy, however, see https://bugs.openjdk.org/browse/JDK-8295496 - so I want to evaluate a few different options here, time allowing.

@wenshao
Copy link
Contributor Author

wenshao commented Oct 21, 2023

@cl4es I found that StringConcatFactory.makeConcatWithConstants will become slower when using recipe("\1.\1", long.class, long.class). If we treat the prefix of length 1 as char, it will become faster.

In this branch https://github.com/wenshao/jdk/tree/optim_decimal_to_string_x2 , if we remove the code that treats the prefix of length 1 as char in the preparer(String prefix, Class<?> cl) method, it will become slower.

package java.lang.invoke;

class StringConcatFactory {
    static MethodHandle prepender(String prefix, Class<?> cl) {
        if (prefix == null || prefix.isEmpty()) {
            return noPrefixPrepender(cl);
        } else {
            // remove this will slower
            if (prefix.length() == 1
                    && (cl == char.class || cl == int.class || cl == long.class)) {
                char ch = prefix.charAt(0);
                MethodHandle prepend = JLA.stringConcatHelper(
                        "prepend",
                        methodType(long.class, long.class, byte[].class, cl, char.class)).rebind();
                return MethodHandles.insertArguments(prepend, 3, ch);
            }

            return MethodHandles.insertArguments(
                    prepender(cl), 3, prefix);
        }
    }
}

Here are the performance numbers running on the MaxBook M1 Max:

-Benchmark                             Mode  Cnt   Score   Error  Units (String prefix)
-BigDecimals.smallScale3ToPlainString  avgt   15  28.898 ? 0.527  ns/op

+Benchmark                             Mode  Cnt   Score   Error  Units (char prefix)
+BigDecimals.smallScale3ToPlainString  avgt   15  23.182 ? 0.477  ns/op

Maybe you should submit another PR to improve StringConcatFactory.makeConcatWithConstants

@wenshao
Copy link
Contributor Author

wenshao commented Oct 22, 2023

@rgiulietti It is currently a stable version, please help me start the review.

Here are the performance numbers running on the MaxBook M1 Pro:

-Benchmark                                   Mode  Cnt    Score   Error  Units (baseline)
-BigDecimals.hugeEngineeringToString         avgt   15  199.683 ? 7.457  ns/op
-BigDecimals.hugeLayoutCharsToString         avgt   15  205.366 ? 2.673  ns/op
-BigDecimals.hugePlainToString               avgt   15  219.066 ? 3.334  ns/op
-BigDecimals.largeScale2EngineeringToString  avgt   15   55.303 ? 0.606  ns/op
-BigDecimals.largeScale2LayoutCharsToString  avgt   15   55.284 ? 0.512  ns/op
-BigDecimals.largeScale2PlainToString        avgt   15   22.836 ? 0.128  ns/op
-BigDecimals.largeScale3EngineeringToString  avgt   15   57.332 ? 0.851  ns/op
-BigDecimals.largeScale3LayoutCharsToString  avgt   15   57.535 ? 1.221  ns/op
-BigDecimals.largeScale3PlainToString        avgt   15   23.967 ? 0.055  ns/op
-BigDecimals.smallScale2EngineeringToString  avgt   15   17.063 ? 0.444  ns/op
-BigDecimals.smallScale2LayoutCharsToString  avgt   15   16.999 ? 0.072  ns/op
-BigDecimals.smallScale2PlainToString        avgt   15   24.069 ? 0.511  ns/op
-BigDecimals.smallScale3EngineeringToString  avgt   15   53.697 ? 0.872  ns/op
-BigDecimals.smallScale3LayoutCharsToString  avgt   15   53.754 ? 1.990  ns/op
-BigDecimals.smallScale3PlainToString        avgt   15   24.596 ? 0.100  ns/op

+Benchmark                                   Mode  Cnt    Score   Error  Units (51b1041)
+BigDecimals.hugeEngineeringToString         avgt   15  194.288 ? 7.294  ns/op (+2.78)
+BigDecimals.hugeLayoutCharsToString         avgt   15  192.121 ? 2.079  ns/op (+6.90)
+BigDecimals.hugePlainToString               avgt   15  201.692 ? 4.527  ns/op (+8.62)
+BigDecimals.largeScale2EngineeringToString  avgt   15   20.128 ? 0.266  ns/op (+174.76)
+BigDecimals.largeScale2LayoutCharsToString  avgt   15   20.681 ? 0.508  ns/op (+167.32)
+BigDecimals.largeScale2PlainToString        avgt   15   19.950 ? 0.273  ns/op (+14.47)
+BigDecimals.largeScale3EngineeringToString  avgt   15   23.564 ? 0.542  ns/op (+143.31)
+BigDecimals.largeScale3LayoutCharsToString  avgt   15   23.424 ? 0.057  ns/op (+145.63)
+BigDecimals.largeScale3PlainToString        avgt   15   22.991 ? 0.048  ns/op (+4.25)
+BigDecimals.smallScale2EngineeringToString  avgt   15   11.641 ? 0.181  ns/op (+46.58)
+BigDecimals.smallScale2LayoutCharsToString  avgt   15   11.589 ? 0.031  ns/op (+46.69)
+BigDecimals.smallScale2PlainToString        avgt   15   11.594 ? 0.032  ns/op (+107.60)
+BigDecimals.smallScale3EngineeringToString  avgt   15   24.684 ? 0.313  ns/op (+117.54)
+BigDecimals.smallScale3LayoutCharsToString  avgt   15   25.058 ? 0.151  ns/op (+114.52)
+BigDecimals.smallScale3PlainToString        avgt   15   24.194 ? 0.217  ns/op (+1.67)

@wenshao
Copy link
Contributor Author

wenshao commented Oct 23, 2023

@cl4es @rgiulietti

Can I add scale2 method to StringLatin1? Like this:

package java.lang;

class StringLatin1 {
    static String scale2(long value) {
        long valueAbs = Math.abs(value);
        long highInt = valueAbs / 100;
        byte[] buf = new byte[Long.stringSize(highInt) + (value < 0 ? 4 : 3)];
        int p = buf.length - 2;
        writeDigitPair(buf, p, (int) (valueAbs % 100));
        buf[--p] = '.';
        getChars(highInt, p, buf);
        if (value < 0) {
            buf[0] = '-';
        }
        if (String.COMPACT_STRINGS) {
            return new String(buf, LATIN1);
        }
        return new String(inflate(buf, 0, buf.length), UTF16);
    }
}

class System {
	private static void setJavaLangAccess() {
		SharedSecrets.setJavaLangAccess(new JavaLangAccess() {
		    public String scale2(long i) {
                        return StringLatin1.scale2(i);
                    }
	        }
	}
}

I think this is more readable than using MethodHandle.

@cl4es
Copy link
Member

cl4es commented Oct 23, 2023

Can I add scale2 method to StringLatin1?

I'd strongly advice against this.

I think this is more readable than using MethodHandle.

View the MethodHandles you add to use StringConcatFactory here as a temporary workaround before the code can be simplified to just regular string concatenation.

…string_x1

# Conflicts:
#	src/java.base/share/classes/java/math/BigDecimal.java
@bridgekeeper
Copy link

bridgekeeper bot commented Dec 8, 2023

@wenshao This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@bridgekeeper
Copy link

bridgekeeper bot commented Jan 18, 2024

@wenshao This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

@bridgekeeper bridgekeeper bot closed this Jan 18, 2024
@wenshao
Copy link
Contributor Author

wenshao commented Jan 21, 2024

/open

@openjdk openjdk bot reopened this Jan 21, 2024
@openjdk
Copy link

openjdk bot commented Jan 21, 2024

@wenshao This pull request is now open

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 18, 2024

@wenshao This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@openjdk
Copy link

openjdk bot commented Mar 13, 2024

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 11, 2024

@wenshao This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@bridgekeeper
Copy link

bridgekeeper bot commented May 9, 2024

@wenshao This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core-libs [email protected] rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

4 participants