-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-37037][SQL] Improve byte array sort by unify compareTo function of UTF8String and ByteArray #34310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-37037][SQL] Improve byte array sort by unify compareTo function of UTF8String and ByteArray #34310
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| ================================================================================================ | ||
| byte array comparisons | ||
| ================================================================================================ | ||
|
|
||
| OpenJDK 64-Bit Server VM 11.0.13+8-LTS on Linux 5.8.0-1042-azure | ||
| Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz | ||
| Byte Array compareTo: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ------------------------------------------------------------------------------------------------------------------------ | ||
| 2-7 byte 501 514 14 130.9 7.6 1.0X | ||
| 8-16 byte 976 993 10 67.1 14.9 0.5X | ||
| 16-32 byte 985 995 6 66.5 15.0 0.5X | ||
| 512-1024 byte 1260 1282 13 52.0 19.2 0.4X | ||
| 512 byte slow 3114 3193 46 21.0 47.5 0.2X | ||
| 2-7 byte 572 578 7 114.5 8.7 0.9X | ||
|
|
||
|
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| ================================================================================================ | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we have 'before' numbers for these? you don't need to include them just want to verify that it also seemed to show an improvement like your local laptop one did
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here is the old code path benchmark result: JDK8 JDK11
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It shows we still have the benefits with GA env.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah, I just notice the env of GA is still different. The two benchmark result based on:
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm inclined to believe it is a win based on your first benchmark. Is there any easy way to run before/after on these Xeons, or is that hard?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I compared the two code path within one patch, and here is the result. JDK8: |
||
| byte array comparisons | ||
| ================================================================================================ | ||
|
|
||
| OpenJDK 64-Bit Server VM 1.8.0_312-b07 on Linux 5.8.0-1042-azure | ||
| Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz | ||
| Byte Array compareTo: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ------------------------------------------------------------------------------------------------------------------------ | ||
| 2-7 byte 407 418 9 161.1 6.2 1.0X | ||
| 8-16 byte 867 919 30 75.6 13.2 0.5X | ||
| 16-32 byte 882 916 23 74.3 13.5 0.5X | ||
| 512-1024 byte 1123 1167 31 58.4 17.1 0.4X | ||
| 512 byte slow 4054 4611 506 16.2 61.9 0.1X | ||
| 2-7 byte 430 450 16 152.4 6.6 0.9X | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,86 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one or more | ||
| * contributor license agreements. See the NOTICE file distributed with | ||
| * this work for additional information regarding copyright ownership. | ||
| * The ASF licenses this file to You under the Apache License, Version 2.0 | ||
| * (the "License"); you may not use this file except in compliance with | ||
| * the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| package org.apache.spark.sql.execution.benchmark | ||
|
|
||
| import scala.util.Random | ||
|
|
||
| import org.apache.spark.benchmark.{Benchmark, BenchmarkBase} | ||
| import org.apache.spark.unsafe.types.ByteArray | ||
|
|
||
| /** | ||
| * Benchmark to measure performance for byte array comparisons. | ||
| * {{{ | ||
| * To run this benchmark: | ||
| * 1. without sbt: | ||
| * bin/spark-submit --class <this class> --jars <spark core test jar> <sql core test jar> | ||
| * 2. build/sbt "sql/test:runMain <this class>" | ||
| * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>" | ||
| * Results will be written to "benchmarks/<this class>-results.txt". | ||
| * }}} | ||
| */ | ||
| object ByteArrayBenchmark extends BenchmarkBase { | ||
|
|
||
| def byteArrayComparisons(iters: Long): Unit = { | ||
| val chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ" | ||
| val random = new Random(0) | ||
| def randomBytes(min: Int, max: Int): Array[Byte] = { | ||
| val len = random.nextInt(max - min) + min | ||
| val bytes = new Array[Byte](len) | ||
| var i = 0 | ||
| while (i < len) { | ||
| bytes(i) = chars.charAt(random.nextInt(chars.length())).toByte | ||
| i += 1 | ||
| } | ||
| bytes | ||
| } | ||
|
|
||
| val count = 16 * 1000 | ||
| val dataTiny = Seq.fill(count)(randomBytes(2, 7)).toArray | ||
| val dataSmall = Seq.fill(count)(randomBytes(8, 16)).toArray | ||
| val dataMedium = Seq.fill(count)(randomBytes(16, 32)).toArray | ||
| val dataLarge = Seq.fill(count)(randomBytes(512, 1024)).toArray | ||
| val dataLargeSlow = Seq.fill(count)( | ||
| Array.tabulate(512) {i => if (i < 511) 0.toByte else 1.toByte}).toArray | ||
|
|
||
| def compareBinary(data: Array[Array[Byte]]) = { _: Int => | ||
| var sum = 0L | ||
| for (_ <- 0L until iters) { | ||
| var i = 0 | ||
| while (i < count) { | ||
| sum += ByteArray.compareBinary(data(i), data((i + 1) % count)) | ||
| i += 1 | ||
| } | ||
| } | ||
| } | ||
|
|
||
| val benchmark = new Benchmark("Byte Array compareTo", count * iters, 25, output = output) | ||
| benchmark.addCase("2-7 byte")(compareBinary(dataTiny)) | ||
| benchmark.addCase("8-16 byte")(compareBinary(dataSmall)) | ||
| benchmark.addCase("16-32 byte")(compareBinary(dataMedium)) | ||
| benchmark.addCase("512-1024 byte")(compareBinary(dataLarge)) | ||
| benchmark.addCase("512 byte slow")(compareBinary(dataLargeSlow)) | ||
| benchmark.addCase("2-7 byte")(compareBinary(dataTiny)) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks like this this case is listed twice. Maybe drop this line?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The first benchmark case may run slower than the latter due to the JIT optimization and this case has small size which can be done in a short time that would be more likely affected. So I also keep it running twice in case this issue. |
||
| benchmark.run() | ||
| } | ||
|
|
||
| override def runBenchmarkSuite(mainArgs: Array[String]): Unit = { | ||
| runBenchmark("byte array comparisons") { | ||
| byteArrayComparisons(1024 * 4) | ||
| } | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm only wondering if this ends up being slower - you already have byte arrays, and now have to go through platform methods to read them?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1.
It seems plausible that the new version will be faster, but it's probably a good idea to run a quick benchmark to confirm. There's a
UTF8StringBenchmarklinked from #19180 (comment) : maybe we could adapt that to work on byte arrays and do a quick before-and-after comparison to just to double check?Edit: just to clarify: I noticed that this benchmark is also linked in the PR description. As Sean points out, I think the key difference in this PR is whether we're using
getByte()versus directly accessing the on-heap byte array (in the linked UTF8String benchmark, both the old and new code were usinggetByte()).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you @srowen and @JoshRosen for point out the difference. I follow the linked benchmark but add a new 512 byte slow benchmark which the first 511 bytes are same. The benchmark result shows it has no regression after this PR and has big benifits if the byte arrays have many same prefix.
Before this PR:
After this PR: