Skip to content

Conversation

@pan3793
Copy link
Member

@pan3793 pan3793 commented Feb 24, 2025

What changes were proposed in this pull request?

Bump zstd-jni to the latest version.

Why are the changes needed?

https://github.com/facebook/zstd/releases/tag/v1.5.7
luben/zstd-jni@v1.5.6-10...v1.5.7-3

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass GHA, also update the benchmark results.

Was this patch authored or co-authored using generative AI tooling?

No.

@pan3793 pan3793 marked this pull request as ready for review February 24, 2025 07:46
@pan3793
Copy link
Member Author

pan3793 commented Feb 24, 2025

cc @dongjoon-hyun @LuciferYang

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @yaooqinn Will the results of this test also fluctuate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Results seem reasonable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it ~

LuciferYang
LuciferYang previously approved these changes Feb 24, 2025
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The degradation looks obviously. cc @yaooqinn

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you run these benchmarks one or two times more? @pan3793

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks worse on level 9.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I managed to run the benchmark on a local Linux machine, it does have perf regression on level 9 parallel compression

master

OpenJDK 64-Bit Server VM 17.0.13+11-LTS on Linux 6.9.3-76060903-generic
Intel(R) Core(TM) i5-9500 CPU @ 3.00GHz
Parallel Compression at level 9:          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Parallel Compression with 0 workers                 162            164           1          0.0     1263562.4       1.0X
Parallel Compression with 1 workers                 203            205           1          0.0     1587511.9       0.8X
Parallel Compression with 2 workers                 121            124           2          0.0      944964.2       1.3X
Parallel Compression with 4 workers                 110            118           6          0.0      861307.6       1.5X
Parallel Compression with 8 workers                 130            135           4          0.0     1018503.5       1.2X
Parallel Compression with 16 workers                162            164           2          0.0     1263810.5       1.0X

this patch

OpenJDK 64-Bit Server VM 17.0.13+11-LTS on Linux 6.9.3-76060903-generic
Intel(R) Core(TM) i5-9500 CPU @ 3.00GHz
Parallel Compression at level 9:          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Parallel Compression with 0 workers                 224            226           2          0.0     1751380.5       1.0X
Parallel Compression with 1 workers                 255            257           2          0.0     1990519.8       0.9X
Parallel Compression with 2 workers                 146            147           1          0.0     1143136.1       1.5X
Parallel Compression with 4 workers                 116            124           8          0.0      905375.1       1.9X
Parallel Compression with 8 workers                 133            139           4          0.0     1042289.8       1.7X
Parallel Compression with 16 workers                161            164           2          0.0     1258236.8       1.4X

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we give luben/zstd-jni/releases/tag/v1.5.7-1 a feedback ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have opened an issue luben/zstd-jni#350

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know what is the compression ratio achieved with both versions? Is it similar?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little busy these days, I will investigate this issue later this week.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for investigating and reporting back to the upstream, @pan3793 and all.

FMX pushed a commit to apache/celeborn that referenced this pull request Feb 25, 2025
### What changes were proposed in this pull request?

Bump zstd-jni version from 1.5.2-1 to 1.5.7-1.

### Why are the changes needed?

Bump zstd-jni to the latest version.

Backport apache/spark#50057.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

No.

Closes #3114 from SteNicholas/CELEBORN-1877.

Authored-by: Nicholas Jiang <[email protected]>
Signed-off-by: mingji <[email protected]>
@LuciferYang
Copy link
Contributor

@pan3793 v1.5.7-2 released

@luben
Copy link

luben commented Mar 24, 2025

@pan3793 v1.5.7-2 released

This adds only one feature on top on v1.5.7-1: Support decompression from byte array to ByteBuffer and vice-versa.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, v1.5.7-3 is ready and looks much better to me than v1.5.7-1.

@dongjoon-hyun
Copy link
Member

Maybe, do you want to re-try with 1.5.7-3, @pan3793 ?

@dongjoon-hyun
Copy link
Member

Thank you, @pan3793 . Please revise the PR title and description too.

@pan3793
Copy link
Member Author

pan3793 commented May 9, 2025

got similar benchmark result with zstd-jni:1.5.7-3, investigating

@luben
Copy link

luben commented May 9, 2025

got similar benchmark result with zstd-jni:1.5.7-3, investigating

I don't expect performance differences between 1.5.7-1 and 1.5.7-3 as they are based on the same zstd upstream code.

@pan3793
Copy link
Member Author

pan3793 commented May 9, 2025

Hi @luben, I mirrored the Spark benchmark to a single Java file, would be great if you can take a look

Main.java

import com.github.luben.zstd.BufferPool;
import com.github.luben.zstd.NoPool;
import com.github.luben.zstd.RecyclingBufferPool;
import com.github.luben.zstd.ZstdOutputStreamNoFinalizer;

import java.io.BufferedOutputStream;
import java.io.ByteArrayOutputStream;
import java.io.ObjectOutputStream;
import java.io.OutputStream;

public class Main {

    static boolean bufferPoolEnabled = false;

    static BufferPool bufferPool = bufferPoolEnabled ? RecyclingBufferPool.INSTANCE : NoPool.INSTANCE;

    static byte[] data = new byte[256 * 1024 * 1024];

    static {
        for (int i = 0; i < 256 * 1024 * 1024; i++) {
            data[i] = (byte) i;
        }
    }

    public static void main(String[] args) throws Exception {
        long start = System.nanoTime();
        for (int j = 0; j < 16; j++) {
            ByteArrayOutputStream baos = new ByteArrayOutputStream();
            ZstdOutputStreamNoFinalizer os = new ZstdOutputStreamNoFinalizer(baos, bufferPool)
                    .setLevel(9)
                    .setWorkers(0)
                    .setCloseFrameOnFlush(true);
            OutputStream zcos = new BufferedOutputStream(os, 32 * 1024);
            ObjectOutputStream oos = new ObjectOutputStream(zcos);
            for (int i = 0; i < 65535; i++) {
                oos.writeObject(data);
            }
            oos.close();
        }
        System.out.println("cost " + (System.nanoTime() - start) / 1000000L + "ms");
    }
}
➜  ~ java -version
openjdk version "17.0.15" 2025-04-15 LTS
OpenJDK Runtime Environment Zulu17.58+21-CA (build 17.0.15+6-LTS)
OpenJDK 64-Bit Server VM Zulu17.58+21-CA (build 17.0.15+6-LTS, mixed mode, sharing)
➜  ~ java -cp ~/.m2/repository/com/github/luben/zstd-jni/1.5.6-10/zstd-jni-1.5.6-10.jar Main.java
cost 2654ms
➜  ~ java -cp ~/.m2/repository/com/github/luben/zstd-jni/1.5.6-10/zstd-jni-1.5.6-10.jar Main.java
cost 2643ms
➜  ~ java -cp ~/.m2/repository/com/github/luben/zstd-jni/1.5.6-10/zstd-jni-1.5.6-10.jar Main.java
cost 2672ms
➜  ~ java -cp ~/.m2/repository/com/github/luben/zstd-jni/1.5.6-10/zstd-jni-1.5.6-10.jar Main.java
cost 2668ms
➜  ~ java -cp ~/.m2/repository/com/github/luben/zstd-jni/1.5.6-10/zstd-jni-1.5.6-10.jar Main.java
cost 2589ms
➜  ~ java -cp ~/.m2/repository/com/github/luben/zstd-jni/1.5.7-3/zstd-jni-1.5.7-3.jar Main.java
cost 3630ms
➜  ~ java -cp ~/.m2/repository/com/github/luben/zstd-jni/1.5.7-3/zstd-jni-1.5.7-3.jar Main.java
cost 3689ms
➜  ~ java -cp ~/.m2/repository/com/github/luben/zstd-jni/1.5.7-3/zstd-jni-1.5.7-3.jar Main.java
cost 3682ms
➜  ~ java -cp ~/.m2/repository/com/github/luben/zstd-jni/1.5.7-3/zstd-jni-1.5.7-3.jar Main.java
cost 3706ms

@luben
Copy link

luben commented May 9, 2025

Thanks, I can reproduce it locally. I will look into the details

@luben
Copy link

luben commented May 9, 2025

I think it's some change in the underlying algorithm. I can reproduce that also with the zstd CLI in the data sample you are using:

$ time zstd-1.5.6 --single-thread -9 data_file_16 -c > /dev/null
zstd-1.5.6 --single-thread -9 data_file_16 -c > /dev/null  3,23s user 0,71s system 123% cpu 3,196 total

$ time zstd-1.5.7 --single-thread -9 data_file_16 -c > /dev/null
zstd-1.5.7 --single-thread -9 data_file_16 -c > /dev/null  3,79s user 0,64s system 116% cpu 3,794 total

the data_file_16 is 4 GiB file that is using the same data as the Java benchmark. BTW, you don't need that for (int i = 0; i < 65535; i++) loop there.

@luben
Copy link

luben commented May 9, 2025

To be fair, using the CLI I cannot reproduce that difference using real-world data (used some log files for testing).

@pan3793
Copy link
Member Author

pan3793 commented May 9, 2025

I think it's some change in the underlying algorithm. I can reproduce that also with the zstd CLI in the data sample you are using

@luben thank you for confirming and clarifying.

To be fair, using the CLI I cannot reproduce that difference using real-world data (used some log files for testing).

@yaooqinn @dongjoon-hyun since you wrote the ZStandardBenchmark, I'd like to hear your opinion.

@luben
Copy link

luben commented May 9, 2025

I will rise the issue with the upstream. BTW, here is minimized benchmark that still exhibits the same difference:

  import com.github.luben.zstd.ZstdOutputStreamNoFinalizer;
  import java.io.ByteArrayOutputStream;

  public class Main {

      static byte[] data = new byte[256 * 1024 * 1024];

      static {
          for (int i = 0; i < 256 * 1024 * 1024; i++) {
              data[i] = (byte) i;
          }
      }

      public static void main(String[] args) throws Exception {
          long start = System.nanoTime();
          for (int j = 0; j < 16; j++) {
              ByteArrayOutputStream baos = new ByteArrayOutputStream();
              ZstdOutputStreamNoFinalizer os = new ZstdOutputStreamNoFinalizer(baos)
                      .setLevel(9)
                      .setWorkers(0);
              os.write(data);
              os.close();
          }
          System.out.println("cost " + (System.nanoTime() - start) / 1000000L + "ms");
      }
  }

@luben
Copy link

luben commented May 10, 2025

There is explanation in facebook/zstd#4385 and it's what I suspected - the highly compressible data used in the benchmark is not representative of real world data.

@LuciferYang
Copy link
Contributor

So should we first rewrite parts of this microbenchmark to make it more aligned with real-world scenarios before proceeding with the upgrades?

@pan3793
Copy link
Member Author

pan3793 commented May 12, 2025

@LuciferYang I'm trying to rewrite the ZStandardBenchmark with TPC-DS data, and already got a reasonable result on local test, will create a new PR soon.

LuciferYang pushed a commit that referenced this pull request May 15, 2025
### What changes were proposed in this pull request?

We found some unreasonable benchmark results during upgrading zstd-jni from 1.5.6-10 to 1.5.7-x in #50057, and the author suggests using real-world data for zstd compression benchmark.

### Why are the changes needed?

Add a new benchmark for zstd with more reasonable data.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Tested on a local machine, Ubuntu 24.04, Intel(R) Core(TM) i5-9500 CPU  3.00GHz

zstd-jni:1.5.6-10
```
================================================================================================
Benchmark ZStandardCompressionCodec
================================================================================================

OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic
Intel(R) Core(TM) i5-9500 CPU  3.00GHz
Benchmark ZStandardCompressionCodec:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
----------------------------------------------------------------------------------------------------------------------------------
Compression 4 times at level 1 without buffer pool           2737           2742           6          0.0   684299199.3       1.0X
Compression 4 times at level 2 without buffer pool           4217           4218           2          0.0  1054165072.5       0.6X
Compression 4 times at level 3 without buffer pool           5660           5661           2          0.0  1414928809.8       0.5X
Compression 4 times at level 1 with buffer pool              2739           2743           6          0.0   684719746.2       1.0X
Compression 4 times at level 2 with buffer pool              4186           4191           8          0.0  1046477235.5       0.7X
Compression 4 times at level 3 with buffer pool              5663           5667           5          0.0  1415762083.2       0.5X

OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic
Intel(R) Core(TM) i5-9500 CPU  3.00GHz
Benchmark ZStandardCompressionCodec:                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
--------------------------------------------------------------------------------------------------------------------------------------
Decompression 4 times from level 1 without buffer pool            943            950          10          0.0   235749387.0       1.0X
Decompression 4 times from level 2 without buffer pool           1239           1244           6          0.0   309753079.0       0.8X
Decompression 4 times from level 3 without buffer pool           1468           1484          23          0.0   366946390.8       0.6X
Decompression 4 times from level 1 with buffer pool               933            942           9          0.0   233286880.8       1.0X
Decompression 4 times from level 2 with buffer pool              1142           1171          40          0.0   285605190.0       0.8X
Decompression 4 times from level 3 with buffer pool              1394           1404          13          0.0   348546518.3       0.7X

OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic
Intel(R) Core(TM) i5-9500 CPU  3.00GHz
Parallel Compression at level 3:          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Parallel Compression with 0 workers                1889           1899          14          0.0   472156817.0       1.0X
Parallel Compression with 1 workers                1715           1717           2          0.0   428826617.0       1.1X
Parallel Compression with 2 workers                 904            906           2          0.0   225890052.0       2.1X
Parallel Compression with 4 workers                 539            548           8          0.0   134735732.5       3.5X
Parallel Compression with 8 workers                 540            548           9          0.0   134889447.5       3.5X
Parallel Compression with 16 workers                577            589          23          0.0   144182540.7       3.3X

OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic
Intel(R) Core(TM) i5-9500 CPU  3.00GHz
Parallel Compression at level 9:          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Parallel Compression with 0 workers                9555           9567          18          0.0  2388642623.3       1.0X
Parallel Compression with 1 workers                7973           8006          47          0.0  1993145509.0       1.2X
Parallel Compression with 2 workers                5070           5071           1          0.0  1267405763.3       1.9X
Parallel Compression with 4 workers                4420           4421           1          0.0  1104977620.3       2.2X
Parallel Compression with 8 workers                4790           4800          15          0.0  1197417939.0       2.0X
Parallel Compression with 16 workers               5000           5003           5          0.0  1249965510.5       1.9X
```

zstd-jni:1.5.7-3
```
================================================================================================
Benchmark ZStandardCompressionCodec
================================================================================================

OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic
Intel(R) Core(TM) i5-9500 CPU  3.00GHz
Benchmark ZStandardCompressionCodec:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
----------------------------------------------------------------------------------------------------------------------------------
Compression 4 times at level 1 without buffer pool           2700           2709          13          0.0   674967564.0       1.0X
Compression 4 times at level 2 without buffer pool           4148           4149           0          0.0  1037124857.0       0.7X
Compression 4 times at level 3 without buffer pool           5660           5682          31          0.0  1414968620.0       0.5X
Compression 4 times at level 1 with buffer pool              2718           2728          14          0.0   679514554.3       1.0X
Compression 4 times at level 2 with buffer pool              4130           4131           2          0.0  1032476406.2       0.7X
Compression 4 times at level 3 with buffer pool              5571           5576           6          0.0  1392871057.5       0.5X

OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic
Intel(R) Core(TM) i5-9500 CPU  3.00GHz
Benchmark ZStandardCompressionCodec:                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
--------------------------------------------------------------------------------------------------------------------------------------
Decompression 4 times from level 1 without buffer pool            942            951           9          0.0   235523684.5       1.0X
Decompression 4 times from level 2 without buffer pool           1248           1270          31          0.0   311906360.5       0.8X
Decompression 4 times from level 3 without buffer pool           1472           1475           4          0.0   368071680.5       0.6X
Decompression 4 times from level 1 with buffer pool               939            956          18          0.0   234631810.0       1.0X
Decompression 4 times from level 2 with buffer pool              1249           1261          16          0.0   312318610.5       0.8X
Decompression 4 times from level 3 with buffer pool              1475           1475           0          0.0   368765939.3       0.6X

OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic
Intel(R) Core(TM) i5-9500 CPU  3.00GHz
Parallel Compression at level 3:          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Parallel Compression with 0 workers                1865           1873          11          0.0   466278397.5       1.0X
Parallel Compression with 1 workers                1785           1793          10          0.0   446359936.8       1.0X
Parallel Compression with 2 workers                 945            953          10          0.0   236142005.8       2.0X
Parallel Compression with 4 workers                 559            577          29          0.0   139754505.5       3.3X
Parallel Compression with 8 workers                 537            555          13          0.0   134328778.3       3.5X
Parallel Compression with 16 workers                587            614          23          0.0   146784965.5       3.2X

OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic
Intel(R) Core(TM) i5-9500 CPU  3.00GHz
Parallel Compression at level 9:          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Parallel Compression with 0 workers                9365           9375          14          0.0  2341247379.0       1.0X
Parallel Compression with 1 workers                8022           8022           0          0.0  2005448255.8       1.2X
Parallel Compression with 2 workers                5054           5069          22          0.0  1263445148.8       1.9X
Parallel Compression with 4 workers                4372           4394          31          0.0  1092926980.8       2.1X
Parallel Compression with 8 workers                4785           4805          28          0.0  1196282275.0       2.0X
Parallel Compression with 16 workers               5012           5028          23          0.0  1252925049.5       1.9X
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #50857 from pan3793/SPARK-52078.

Authored-by: Cheng Pan <[email protected]>
Signed-off-by: yangjie01 <[email protected]>
@pan3793 pan3793 changed the title [SPARK-51301][BUILD] Bump zstd-jni 1.5.7-1 [SPARK-51301][BUILD] Bump zstd-jni 1.5.7-3 May 15, 2025
@github-actions github-actions bot added the CORE label May 15, 2025
@pan3793 pan3793 marked this pull request as ready for review May 15, 2025 07:32
@pan3793 pan3793 requested review from LuciferYang and beliefer May 15, 2025 07:35
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

yhuang-db pushed a commit to yhuang-db/spark that referenced this pull request Jun 9, 2025
### What changes were proposed in this pull request?

We found some unreasonable benchmark results during upgrading zstd-jni from 1.5.6-10 to 1.5.7-x in apache#50057, and the author suggests using real-world data for zstd compression benchmark.

### Why are the changes needed?

Add a new benchmark for zstd with more reasonable data.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Tested on a local machine, Ubuntu 24.04, Intel(R) Core(TM) i5-9500 CPU  3.00GHz

zstd-jni:1.5.6-10
```
================================================================================================
Benchmark ZStandardCompressionCodec
================================================================================================

OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic
Intel(R) Core(TM) i5-9500 CPU  3.00GHz
Benchmark ZStandardCompressionCodec:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
----------------------------------------------------------------------------------------------------------------------------------
Compression 4 times at level 1 without buffer pool           2737           2742           6          0.0   684299199.3       1.0X
Compression 4 times at level 2 without buffer pool           4217           4218           2          0.0  1054165072.5       0.6X
Compression 4 times at level 3 without buffer pool           5660           5661           2          0.0  1414928809.8       0.5X
Compression 4 times at level 1 with buffer pool              2739           2743           6          0.0   684719746.2       1.0X
Compression 4 times at level 2 with buffer pool              4186           4191           8          0.0  1046477235.5       0.7X
Compression 4 times at level 3 with buffer pool              5663           5667           5          0.0  1415762083.2       0.5X

OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic
Intel(R) Core(TM) i5-9500 CPU  3.00GHz
Benchmark ZStandardCompressionCodec:                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
--------------------------------------------------------------------------------------------------------------------------------------
Decompression 4 times from level 1 without buffer pool            943            950          10          0.0   235749387.0       1.0X
Decompression 4 times from level 2 without buffer pool           1239           1244           6          0.0   309753079.0       0.8X
Decompression 4 times from level 3 without buffer pool           1468           1484          23          0.0   366946390.8       0.6X
Decompression 4 times from level 1 with buffer pool               933            942           9          0.0   233286880.8       1.0X
Decompression 4 times from level 2 with buffer pool              1142           1171          40          0.0   285605190.0       0.8X
Decompression 4 times from level 3 with buffer pool              1394           1404          13          0.0   348546518.3       0.7X

OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic
Intel(R) Core(TM) i5-9500 CPU  3.00GHz
Parallel Compression at level 3:          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Parallel Compression with 0 workers                1889           1899          14          0.0   472156817.0       1.0X
Parallel Compression with 1 workers                1715           1717           2          0.0   428826617.0       1.1X
Parallel Compression with 2 workers                 904            906           2          0.0   225890052.0       2.1X
Parallel Compression with 4 workers                 539            548           8          0.0   134735732.5       3.5X
Parallel Compression with 8 workers                 540            548           9          0.0   134889447.5       3.5X
Parallel Compression with 16 workers                577            589          23          0.0   144182540.7       3.3X

OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic
Intel(R) Core(TM) i5-9500 CPU  3.00GHz
Parallel Compression at level 9:          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Parallel Compression with 0 workers                9555           9567          18          0.0  2388642623.3       1.0X
Parallel Compression with 1 workers                7973           8006          47          0.0  1993145509.0       1.2X
Parallel Compression with 2 workers                5070           5071           1          0.0  1267405763.3       1.9X
Parallel Compression with 4 workers                4420           4421           1          0.0  1104977620.3       2.2X
Parallel Compression with 8 workers                4790           4800          15          0.0  1197417939.0       2.0X
Parallel Compression with 16 workers               5000           5003           5          0.0  1249965510.5       1.9X
```

zstd-jni:1.5.7-3
```
================================================================================================
Benchmark ZStandardCompressionCodec
================================================================================================

OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic
Intel(R) Core(TM) i5-9500 CPU  3.00GHz
Benchmark ZStandardCompressionCodec:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
----------------------------------------------------------------------------------------------------------------------------------
Compression 4 times at level 1 without buffer pool           2700           2709          13          0.0   674967564.0       1.0X
Compression 4 times at level 2 without buffer pool           4148           4149           0          0.0  1037124857.0       0.7X
Compression 4 times at level 3 without buffer pool           5660           5682          31          0.0  1414968620.0       0.5X
Compression 4 times at level 1 with buffer pool              2718           2728          14          0.0   679514554.3       1.0X
Compression 4 times at level 2 with buffer pool              4130           4131           2          0.0  1032476406.2       0.7X
Compression 4 times at level 3 with buffer pool              5571           5576           6          0.0  1392871057.5       0.5X

OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic
Intel(R) Core(TM) i5-9500 CPU  3.00GHz
Benchmark ZStandardCompressionCodec:                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
--------------------------------------------------------------------------------------------------------------------------------------
Decompression 4 times from level 1 without buffer pool            942            951           9          0.0   235523684.5       1.0X
Decompression 4 times from level 2 without buffer pool           1248           1270          31          0.0   311906360.5       0.8X
Decompression 4 times from level 3 without buffer pool           1472           1475           4          0.0   368071680.5       0.6X
Decompression 4 times from level 1 with buffer pool               939            956          18          0.0   234631810.0       1.0X
Decompression 4 times from level 2 with buffer pool              1249           1261          16          0.0   312318610.5       0.8X
Decompression 4 times from level 3 with buffer pool              1475           1475           0          0.0   368765939.3       0.6X

OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic
Intel(R) Core(TM) i5-9500 CPU  3.00GHz
Parallel Compression at level 3:          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Parallel Compression with 0 workers                1865           1873          11          0.0   466278397.5       1.0X
Parallel Compression with 1 workers                1785           1793          10          0.0   446359936.8       1.0X
Parallel Compression with 2 workers                 945            953          10          0.0   236142005.8       2.0X
Parallel Compression with 4 workers                 559            577          29          0.0   139754505.5       3.3X
Parallel Compression with 8 workers                 537            555          13          0.0   134328778.3       3.5X
Parallel Compression with 16 workers                587            614          23          0.0   146784965.5       3.2X

OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic
Intel(R) Core(TM) i5-9500 CPU  3.00GHz
Parallel Compression at level 9:          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Parallel Compression with 0 workers                9365           9375          14          0.0  2341247379.0       1.0X
Parallel Compression with 1 workers                8022           8022           0          0.0  2005448255.8       1.2X
Parallel Compression with 2 workers                5054           5069          22          0.0  1263445148.8       1.9X
Parallel Compression with 4 workers                4372           4394          31          0.0  1092926980.8       2.1X
Parallel Compression with 8 workers                4785           4805          28          0.0  1196282275.0       2.0X
Parallel Compression with 16 workers               5012           5028          23          0.0  1252925049.5       1.9X
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#50857 from pan3793/SPARK-52078.

Authored-by: Cheng Pan <[email protected]>
Signed-off-by: yangjie01 <[email protected]>
yhuang-db pushed a commit to yhuang-db/spark that referenced this pull request Jun 9, 2025
### What changes were proposed in this pull request?

Bump zstd-jni to the latest version.

### Why are the changes needed?

https://github.com/facebook/zstd/releases/tag/v1.5.7
luben/zstd-jni@v1.5.6-10...v1.5.7-3

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GHA, also update the benchmark results.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#50057 from pan3793/SPARK-51301.

Authored-by: Cheng Pan <[email protected]>
Signed-off-by: yangjie01 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants