Skip to content

Conversation

@ylpoonlg
Copy link
Contributor

Performance Results

Run on Neoverse-V2

Method Size Mean Error StdDev Median Min Max Allocated
Scalar 15 6.604 ns 0.1605 ns 0.1577 ns 6.707 ns 6.397 ns 6.748 ns -
Vector128OddEvenSort 15 13.196 ns 0.0052 ns 0.0048 ns 13.195 ns 13.190 ns 13.205 ns -
SveOddEvenSort 15 19.118 ns 0.0149 ns 0.0139 ns 19.120 ns 19.091 ns 19.138 ns -
SveTail 15 17.710 ns 0.0153 ns 0.0136 ns 17.708 ns 17.693 ns 17.736 ns -
Scalar 127 42.785 ns 0.0486 ns 0.0379 ns 42.771 ns 42.738 ns 42.844 ns -
Vector128OddEvenSort 127 35.148 ns 0.0383 ns 0.0358 ns 35.155 ns 35.056 ns 35.183 ns -
SveOddEvenSort 127 90.251 ns 0.1244 ns 0.1164 ns 90.283 ns 90.023 ns 90.436 ns -
SveTail 127 32.287 ns 0.0128 ns 0.0113 ns 32.285 ns 32.275 ns 32.315 ns -
Scalar 527 180.766 ns 0.2361 ns 0.2093 ns 180.670 ns 180.580 ns 181.265 ns -
Vector128OddEvenSort 527 149.811 ns 0.0485 ns 0.0405 ns 149.810 ns 149.771 ns 149.920 ns -
SveOddEvenSort 527 376.324 ns 0.5892 ns 0.4920 ns 376.425 ns 375.506 ns 377.375 ns -
SveTail 527 130.668 ns 0.1057 ns 0.0937 ns 130.643 ns 130.566 ns 130.867 ns -
Scalar 10015 3,774.353 ns 17.7558 ns 13.8626 ns 3,770.468 ns 3,758.883 ns 3,809.131 ns -
Vector128OddEvenSort 10015 2,813.042 ns 0.9627 ns 0.8534 ns 2,813.069 ns 2,812.061 ns 2,815.025 ns -
SveOddEvenSort 10015 7,162.929 ns 5.7581 ns 5.3861 ns 7,161.363 ns 7,156.213 ns 7,173.464 ns -
SveTail 10015 2,468.773 ns 1.2800 ns 1.1346 ns 2,468.508 ns 2,467.575 ns 2,470.949 ns -

cc @dotnet/arm64-contrib @SwapnilGaikwad @LoopedBard3

@tannergooding
Copy link
Member

I have a general concern over all the SVE specific microbenchmarks being added.

Benchmarks are fairly expensive in terms of runtime and even a small number of them can have significant cost to CI and our tracking. We don't really have any platform specific intrinsic benchmarks correspondingly (i.e. you don't see explicit benchmarks covering AdvSimd or Avx512).

Rather instead we typically have our normal benchmarks like for Span, Linq, Tensors and other areas which internally accelerate using SIMD and we measure those across a range of hardware. We have some machines that support newer ISAs and can then correlate and see the more real world improvements based on that.

I would expect here that we aren't directly testing SVE either. But rather would be testing with SVE enabled and comparing that against a run with it disabled. This will require a bit more work in the JIT to enable first, but significantly reduces cost and gives better metrics as to the benefit customers will see.

CC. @DrewScoggins

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants