Skip to content

Conversation

@fmeum
Copy link
Contributor

@fmeum fmeum commented Feb 5, 2025

This changes introduces a new LongFunction<A> sizedSupplier() method on Collector that is passed the exact size of the stream if known and -1 else. This can greatly reduces the amount of allocations and copying for streams that reshape data structures rather than perform extensive computations.

For convenience, Collector.ofSized factory methods are added that accept an additional IntFunction<A> that is called instead of the provided Supplier<A> if the stream size is known. Data structures that can have sizes that don't fit in an int can still use the new feature by providing their own implementation of Collector.

The default implementation of sizedSupplier returns a function that ignores its argument and always returns supplier().get(). Existing Collector implementations outside the standard library can adopt sizedSupplier while remaining backwards compatible with older JDKs by overriding the method without an @Override annotation.

Existing collectors in Collectors are updated to use the new functionality if the final size of the collection can be determined from the stream size (e.g. toMap without a merge function does, but toSet doesn't). For this purpose, a new constructor accepting an initial capacity is added to StringJoiner.

enchmark                      (N)  (Q)   Mode  Cnt         Score        Error  Units
SizedCollectors.join_unsized    10    1  thrpt   15   4233447.810 ?  98499.397  ops/s
SizedCollectors.join_sized      10    1  thrpt   15   5286434.473 ? 166673.025  ops/s
SizedCollectors.join_unsized    10   10  thrpt   15   3034816.772 ? 174571.668  ops/s
SizedCollectors.join_sized      10   10  thrpt   15   4024021.816 ? 157577.169  ops/s
SizedCollectors.join_unsized    10  100  thrpt   15    707715.604 ?   3184.553  ops/s
SizedCollectors.join_sized      10  100  thrpt   15    760371.153 ?   5157.703  ops/s
SizedCollectors.join_unsized   100    1  thrpt   15    673844.384 ?  47029.601  ops/s
SizedCollectors.join_sized     100    1  thrpt   15    818333.388 ?  10969.052  ops/s
SizedCollectors.join_unsized   100   10  thrpt   15    521942.603 ?   5093.141  ops/s
SizedCollectors.join_sized     100   10  thrpt   15    611704.375 ?    896.388  ops/s
SizedCollectors.join_unsized   100  100  thrpt   15     81918.953 ?     88.126  ops/s
SizedCollectors.join_sized     100  100  thrpt   15     83639.848 ?    838.855  ops/s
SizedCollectors.join_unsized  1000    1  thrpt   15     66562.080 ?   5966.190  ops/s
SizedCollectors.join_sized    1000    1  thrpt   15     80936.274 ?    663.293  ops/s
SizedCollectors.join_unsized  1000   10  thrpt   15     40629.108 ?   3716.319  ops/s
SizedCollectors.join_sized    1000   10  thrpt   15     58545.365 ?   4402.039  ops/s
SizedCollectors.join_unsized  1000  100  thrpt   15      8121.611 ?    103.512  ops/s
SizedCollectors.join_sized    1000  100  thrpt   15      8405.034 ?    124.720  ops/s
SizedCollectors.par_unsized     10    1  thrpt   15     84153.343 ?    220.003  ops/s
SizedCollectors.par_sized       10    1  thrpt   15     83960.770 ?    376.979  ops/s
SizedCollectors.par_unsized     10   10  thrpt   15     83706.680 ?    542.130  ops/s
SizedCollectors.par_sized       10   10  thrpt   15     83977.474 ?    235.026  ops/s
SizedCollectors.par_unsized     10  100  thrpt   15     72888.079 ?   2127.890  ops/s
SizedCollectors.par_sized       10  100  thrpt   15     73898.169 ?   1083.144  ops/s
SizedCollectors.par_unsized    100    1  thrpt   15     29857.614 ?    199.473  ops/s
SizedCollectors.par_sized      100    1  thrpt   15     33388.971 ?    426.888  ops/s
SizedCollectors.par_unsized    100   10  thrpt   15     28910.485 ?    639.516  ops/s
SizedCollectors.par_sized      100   10  thrpt   15     32446.279 ?    702.879  ops/s
SizedCollectors.par_unsized    100  100  thrpt   15     27249.092 ?    215.217  ops/s
SizedCollectors.par_sized      100  100  thrpt   15     30963.397 ?    563.199  ops/s
SizedCollectors.par_unsized   1000    1  thrpt   15     16152.241 ?    401.529  ops/s
SizedCollectors.par_sized     1000    1  thrpt   15     26330.794 ?    309.816  ops/s
SizedCollectors.par_unsized   1000   10  thrpt   15     16125.328 ?    341.179  ops/s
SizedCollectors.par_sized     1000   10  thrpt   15     26312.005 ?    478.718  ops/s
SizedCollectors.par_unsized   1000  100  thrpt   15     13751.145 ?     55.326  ops/s
SizedCollectors.par_sized     1000  100  thrpt   15     16742.332 ?   1265.596  ops/s
SizedCollectors.seq_unsized     10    1  thrpt   15  10823779.122 ?  16641.626  ops/s
SizedCollectors.seq_sized       10    1  thrpt   15  10692431.953 ?  19015.373  ops/s
SizedCollectors.seq_unsized     10   10  thrpt   15   7248428.651 ?  19737.626  ops/s
SizedCollectors.seq_sized       10   10  thrpt   15   7224901.888 ?  14209.393  ops/s
SizedCollectors.seq_unsized     10  100  thrpt   15    849586.014 ?   1973.570  ops/s
SizedCollectors.seq_sized       10  100  thrpt   15    856846.388 ?   4701.923  ops/s
SizedCollectors.seq_unsized    100    1  thrpt   15    471001.842 ?   2558.377  ops/s
SizedCollectors.seq_sized      100    1  thrpt   15   1187773.258 ?   2856.554  ops/s
SizedCollectors.seq_unsized    100   10  thrpt   15    401038.089 ?    778.214  ops/s
SizedCollectors.seq_sized      100   10  thrpt   15    810603.260 ?   3692.264  ops/s
SizedCollectors.seq_unsized    100  100  thrpt   15     78169.815 ?     64.158  ops/s
SizedCollectors.seq_sized      100  100  thrpt   15     84488.831 ?    111.441  ops/s
SizedCollectors.seq_unsized   1000    1  thrpt   15     49838.201 ?    165.634  ops/s
SizedCollectors.seq_sized     1000    1  thrpt   15    122736.258 ?   1741.051  ops/s
SizedCollectors.seq_unsized   1000   10  thrpt   15     42534.003 ?    179.221  ops/s
SizedCollectors.seq_sized     1000   10  thrpt   15     81052.905 ?    743.172  ops/s
SizedCollectors.seq_unsized   1000  100  thrpt   15      7885.440 ?     19.195  ops/s
SizedCollectors.seq_sized     1000  100  thrpt   15      8796.001 ?     60.360  ops/s

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8072840: Add a method to Collector that returns a sized supplying mutable result container (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/23461/head:pull/23461
$ git checkout pull/23461

Update a local copy of the PR:
$ git checkout pull/23461
$ git pull https://git.openjdk.org/jdk.git pull/23461/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 23461

View PR using the GUI difftool:
$ git pr show -t 23461

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/23461.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 5, 2025

👋 Welcome back fmeum! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Feb 5, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk
Copy link

openjdk bot commented Feb 5, 2025

@fmeum The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@fmeum fmeum changed the title 8072840: Presized collectors 8072840: Add a method to Collector that returns a sized supplying mutable result container Feb 5, 2025
@fmeum fmeum force-pushed the 8072840-sized-supplier branch 9 times, most recently from 376ea9b to dc06e2d Compare February 8, 2025 15:31
@fmeum fmeum force-pushed the 8072840-sized-supplier branch from 01828ad to f9d9312 Compare February 10, 2025 23:13
copybara-service bot pushed a commit to google/guava that referenced this pull request Feb 12, 2025
Demo of Guava changes for openjdk/jdk#23461.

This change pre-sizes collectors for which the size of the output collection must match the size of the input stream. It omits cases like `ImmutableSet` (which deduplicates), but it includes cases `ImmutableList` (obviously) and `ImmutableMap`/`ImmutableBiMap` (which rejects duplicate keys).

RELNOTES=`collect`: Changed `toImmutableList`, `toImmutableMap`, and `toImmutableBiMap` to internally pre-size their collections when possible.
PiperOrigin-RevId: 725756865
copybara-service bot pushed a commit to google/guava that referenced this pull request Feb 12, 2025
Demo of Guava changes for openjdk/jdk#23461.

This change pre-sizes collectors for which the size of the output collection must match the size of the input stream. It omits cases like `ImmutableSet` (which deduplicates), but it includes cases `ImmutableList` (obviously) and `ImmutableMap`/`ImmutableBiMap` (which rejects duplicate keys).

RELNOTES=`collect`: Changed `toImmutableList`, `toImmutableMap`, and `toImmutableBiMap` to internally pre-size their collections when possible.
PiperOrigin-RevId: 725756865
@bridgekeeper
Copy link

bridgekeeper bot commented Apr 9, 2025

@fmeum This pull request has been inactive for more than 8 weeks and will be automatically closed if another 8 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 4, 2025

@fmeum This pull request has been inactive for more than 16 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

@bridgekeeper bridgekeeper bot closed this Jun 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants