Skip to content

Throwing an Exception is 2.5 times slower on native-image #6451

@SergejIsbrecht

Description

@SergejIsbrecht

Describe the issue
During a benchmark I noticed high numbers for a slow path within our code. It uses Enum#valueOf, and if an Exception is thrown, something different will be done. I was intrigued and looked a little bit closer into this issue.

OpenJDK 17 amd64

Benchmark                                             Mode  Cnt     Score    Error   Units
FromStringBench.valueOf                               avgt    5     4.729 ±  0.234   ns/op
FromStringBench.valueOf:·gc.alloc.rate                avgt    5    ≈ 10⁻⁴           MB/sec
FromStringBench.valueOf:·gc.alloc.rate.norm           avgt    5    ≈ 10⁻⁷             B/op
FromStringBench.valueOf:·gc.count                     avgt    5       ≈ 0           counts
FromStringBench.valueOfException                      avgt    5  1195.178 ± 65.872   ns/op
FromStringBench.valueOfException:·gc.alloc.rate       avgt    5   766.129 ± 41.666  MB/sec
FromStringBench.valueOfException:·gc.alloc.rate.norm  avgt    5   960.000 ±  0.001    B/op
FromStringBench.valueOfException:·gc.count            avgt    5   129.000           counts
FromStringBench.valueOfException:·gc.time             avgt    5    84.000               ms

native-image JDK17 amd64

Benchmark                                        Mode  Cnt     Score    Error   Units
FromStringBench.valueOf                          avgt    5     7.475 ±  0.187   ns/op
FromStringBench.valueOf:·gc.alloc.rate           avgt    5       ≈ 0           MB/sec
FromStringBench.valueOf:·gc.count                avgt    5       ≈ 0           counts
FromStringBench.valueOfException                 avgt    5  2988.646 ± 23.550   ns/op
FromStringBench.valueOfException:·gc.alloc.rate  avgt    5       ≈ 0           MB/sec
FromStringBench.valueOfException:·gc.count       avgt    5   437.000           counts
FromStringBench.valueOfException:·gc.time        avgt    5   898.000               ms

(The OpenJDK 17 version on hotspot is 2.5 times faster than the native-image one for valueOfException)

Calling Enum#valueOf seems to be equally fast, but calling Enum#valueOf with a value not in the enumeration results in an Exception. To me, it looks like unwinding the stack is quite costly. See

flamegraph

According to taken flamegraph (-F 499; callgraph dwarf) 50% of time is spend in CodeInfoTable::lookupCodeInfoQueryResult. In absolute terms de-compression costs 37% (absolute) FrameInfoDecoder::decodeFrameInfo.

To me it looks like CompressedOops, but for Codeinformation. Am I right?

Steps to reproduce the issue
Please include both build steps as well as run steps

  1. git clone https://github.com/SergejIsbrecht/Bench2.git
  2. see README.md to build and run on OpenJDK and native-image

Describe GraalVM and your environment:

  • GraalVM version GraalVM CE 23.1.0-dev-20230414_0206
  • JDK major version: JDK17
  • OS: Ubuntu 20.04 - 5.x
  • Architecture: AMD64

More details

See build.gradke.kts -> graalvmNative for used native-image command line options:

            buildArgs.add("--verbose")
            buildArgs.add("--no-fallback")
            // buildArgs.add("-g")
            // buildArgs.add("-H:-DeleteLocalSymbols")
            buildArgs.add("-H:IncludeResources=.*/BenchmarkList")
            buildArgs.add("-H:Log=registerResource:verbose")
            buildArgs.add("--initialize-at-build-time=org.openjdk.jmh.infra,org.openjdk.jmh.util.Utils,org.openjdk.jmh.runner.InfraControl,org.openjdk.jmh.runner.InfraControlL0,org.openjdk.jmh.runner.InfraControlL1,org.openjdk.jmh.runner.InfraControlL2,org.openjdk.jmh.runner.InfraControlL3,org.openjdk.jmh.runner.InfraControlL4")
            buildArgs.add("-H:-SpawnIsolates")
            buildArgs.add("-H:+UseSerialGC")
            buildArgs.add("-H:InitialCollectionPolicy=BySpaceAndTime")
            buildArgs.add("-H:AlignedHeapChunkSize=524288")
            buildArgs.add("-H:+ReportExceptionStackTraces")
            buildArgs.add("--enable-monitoring=all")
            buildArgs.add("-J-Xmx20g")
            buildArgs.add("-march=native")

Resolution

Either improve the frame-info de-compression or provide the ability to build an native-image without compression. Am I right in the assumption, that not using some kind of pointer compression results in way bigger binaries?

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions