Skip to content

Conversation

@osa1
Copy link
Member

@osa1 osa1 commented Feb 10, 2025

  • Inline _readPacked manually and _withLimit with a pragma to eliminate
    closure allocation and calls in packed decoding loops.

  • Introduce PbList._addUnchecked to add to the list without checking the
    value for validity and list for mutability.

  • When decoding a packed field, check the list mutability once, instead of for
    every element.

  • When decoding a packed scalar field, don't check for value validity.

    For scalar fields we need to make sure the field value is not null, which is
    already guaranteed in the call sites as e.g. input.readDouble doesn't
    return nullable.

  • Sprinkle a bunch of prefer-inlines to make sure VM will inline one liners.

VM benchmarks before:

protobuf_PackedInt32Decoding(RunTimeRaw): 25598.8125 us.
protobuf_PackedInt64Decoding(RunTimeRaw): 67932.43333333333 us.
protobuf_PackedUint32Decoding(RunTimeRaw): 24668.844444444443 us.
protobuf_PackedUint64Decoding(RunTimeRaw): 64615.066666666666 us.
protobuf_PackedSint32Decoding(RunTimeRaw): 26037.275 us.
protobuf_PackedSint64Decoding(RunTimeRaw): 100819.65 us.
protobuf_PackedBoolDecoding(RunTimeRaw): 34733.4 us.
protobuf_PackedEnumDecoding(RunTimeRaw): 48379.659999999996 us.

VM benchmarks after:

protobuf_PackedInt32Decoding(RunTimeRaw): 19653.9 us.
protobuf_PackedInt64Decoding(RunTimeRaw): 48627.9 us.
protobuf_PackedUint32Decoding(RunTimeRaw): 19279.29090909091 us.
protobuf_PackedUint64Decoding(RunTimeRaw): 50681.8 us.
protobuf_PackedSint32Decoding(RunTimeRaw): 20271.854545454546 us.
protobuf_PackedSint64Decoding(RunTimeRaw): 83777.8 us.
protobuf_PackedBoolDecoding(RunTimeRaw): 24850.555555555555 us.
protobuf_PackedEnumDecoding(RunTimeRaw): 45205.659999999996 us.

Wasm benchmarks before (-O2):

protobuf_PackedInt32Decoding(RunTimeRaw): 64220.0 us.
protobuf_PackedInt64Decoding(RunTimeRaw): 81033.33333333334 us.
protobuf_PackedUint32Decoding(RunTimeRaw): 60800.0 us.
protobuf_PackedUint64Decoding(RunTimeRaw): 82700.0 us.
protobuf_PackedSint32Decoding(RunTimeRaw): 72433.33333333334 us.
protobuf_PackedSint64Decoding(RunTimeRaw): 142150.0 us.
protobuf_PackedBoolDecoding(RunTimeRaw): 27775.0 us.
protobuf_PackedEnumDecoding(RunTimeRaw): 43980.0 us.

Wasm benchmarks after:

protobuf_PackedInt32Decoding(RunTimeRaw): 56050.0 us.
protobuf_PackedInt64Decoding(RunTimeRaw): 74633.33333333334 us.
protobuf_PackedUint32Decoding(RunTimeRaw): 56525.0 us.
protobuf_PackedUint64Decoding(RunTimeRaw): 69400.0 us.
protobuf_PackedSint32Decoding(RunTimeRaw): 51925.0 us.
protobuf_PackedSint64Decoding(RunTimeRaw): 116250.0 us.
protobuf_PackedBoolDecoding(RunTimeRaw): 18427.272727272728 us.
protobuf_PackedEnumDecoding(RunTimeRaw): 41600.0 us.

cl/755309114

@osa1 osa1 marked this pull request as draft February 11, 2025 10:26
@mkustermann
Copy link
Collaborator

It seems in dart2wasm we have

// before
protobuf_PackedInt64Decoding(RunTimeRaw): 77833.33333333334 us.

// after
protobuf_PackedInt64Decoding(RunTimeRaw): 90650.0 us.

i.e. a regression, is this expected?

Copy link
Collaborator

@mkustermann mkustermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with comments

case PbFieldType._REPEATED_BYTES:
final list = fs._ensureRepeatedField(meta, fi);
list.add(input.readBytes());
list._checkModifiable('add');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be hand-inlined code, omitting the _check call. It seems not very nice we have to copy&paste this many times.

Why not pass in a bool whether to check or not, and force inline the function. E.g.

_addInternal(input.readBytes(), omitCheck: true);

@pragma('vm/dart2js/dart2wasm:prefer-inline')
_addInternal(E value, {bool omitCheck = false}) {
  _checkModifiable(value, 'add');
  if (!omitCheck) _check(value);
  _wrappedList.add(value);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function wouldn't work below where we check for modifiable once before adding the elements, and then never check it while adding the elements.

We could have this version:

_addInternal(E value, {bool omitElementCheck = false, bool omitModifiableCheck = false}) {
  if (!omitModifiableCheck) _checkModifiable(value, 'add');
  if (!omitElementCheck) _check(value);
  _wrappedList.add(value);
}

But we would still need a _checkModifiable call in packed fields as we want to run it once.

Overall not sure if this version is better than the current..

@osa1
Copy link
Member Author

osa1 commented May 6, 2025

It seems in dart2wasm we have

// before
protobuf_PackedInt64Decoding(RunTimeRaw): 77833.33333333334 us.

// after
protobuf_PackedInt64Decoding(RunTimeRaw): 90650.0 us.

i.e. a regression, is this expected?

Maybe my CPU became busy with other things while running it, when I run the benchmarks again I consistently get better numbers with this PR. I updated the PR description with more recent benchmark numbers from a run just now.

@osa1
Copy link
Member Author

osa1 commented May 6, 2025

I tested this internally in cl/755309114. Merging ...

@osa1 osa1 merged commit 9daf5ca into google:master May 6, 2025
17 checks passed
@osa1 osa1 deleted the vm_inlines branch May 6, 2025 16:07
copybara-service bot pushed a commit to dart-lang/sdk that referenced this pull request May 12, 2025
Revisions updated by `dart tools/rev_sdk_deps.dart`.

dartdoc (https://github.com/dart-lang/dartdoc/compare/e4f9451..95f4208):
  95f4208e  2025-05-12  Sam Rawlins  Simplify Inheritable.computeCanonicalEnclosingContainer. (dart-lang/dartdoc#4047)

http (https://github.com/dart-lang/http/compare/78d6114..e70a41b):
  e70a41b  2025-05-09  Brian Quinlan  Clarify that some headers may not be sent/received (dart-lang/http#1768)
  d99cc3c  2025-05-07  Brian Quinlan  Return a customized `StreamedResponse` from `CronetClient.send` (dart-lang/http#1769)
  6b92d99  2025-05-07  JohnettJobben1  [web_socket_channel] Shorten library description for pub score improvement (dart-lang/http#1737)
  31da355  2025-05-05  Brian Quinlan  Prepare cronet_http/cupertino_http/http/web_socket for release (dart-lang/http#1767)
  dfbe73d  2025-05-05  Brian Quinlan  Ignore received data after the response stream has been closed (dart-lang/http#1766)

protobuf (https://github.com/dart-lang/protobuf/compare/1aaa332..7d2e615):
  7d2e615  2025-05-12  Agam Agarwal  Add fromDart() and toDart() methods to convert between $core.Duration and proto Duration (google/protobuf.dart#986)
  e4fca16  2025-05-12  Ömer Sinan Ağacan  Add sparse enum decoding benchmarks (google/protobuf.dart#984)
  006d3aa  2025-05-09  Ömer Sinan Ağacan  Update protoc_plugin pre-generated protos (google/protobuf.dart#982)
  4abee01  2025-05-09  Ömer Sinan Ağacan  Sort input proto files before processing (google/protobuf.dart#983)
  60e23f1  2025-05-07  Ömer Sinan Ağacan  Fix factory argument types for map fields (google/protobuf.dart#976)
  de6bcc2  2025-05-07  Ömer Sinan Ağacan  Fix packed enum decoding benchmark (google/protobuf.dart#979)
  9daf5ca  2025-05-06  Ömer Sinan Ağacan  Improve packed field decoding (google/protobuf.dart#959)

tools (https://github.com/dart-lang/tools/compare/92f10a9..36f5c9f):
  36f5c9f9  2025-05-05  Jacob MacDonald  broaden the publish tag regex to allow digits (dart-lang/tools#2085)
  c6a10613  2025-05-05  Goddchen  fix(clock): keep micros in monthsAgo, monthsFromNow and yearsAgo (dart-lang/tools#1202)
  f1f8ac18  2025-05-02  Liam Appelbe  [coverage] Fix another flaky lifecycle management error (dart-lang/tools#2082)

webdev (https://github.com/dart-lang/webdev/compare/5bf833d..1ea8462):
  1ea84624  2025-05-08  Nicholas Shahan  [gardening] Temporarily skip failing test case (dart-lang/webdev#2618)

Change-Id: I193c34b97e7acf1cf52c91240765344e47424b73
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/428100
Commit-Queue: Konstantin Shcheglov <[email protected]>
Auto-Submit: Devon Carew <[email protected]>
Reviewed-by: Konstantin Shcheglov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants