Skip to content

Conversation

@osa1
Copy link
Member

@osa1 osa1 commented May 7, 2025

While investigating why decoding packed enums is so much slower in AOT compared to decoding packed int32 (both are varints on the wire) I noticed that the enum decoding benchmark should actually be slower, because currently TFA is able to specialize the enum int value to Dart enum value mapping to a direct call, in this function:

ProtobufEnum? _decodeEnum(
    int tagNumber, ExtensionRegistry? registry, int rawValue) {
  final f = valueOfFunc(tagNumber);
  if (f != null) {
    return f(rawValue);  // <------------------- HERE
  }
  ...
}

Wasm code for this function, before this PR:

(func $BuilderInfo._decodeEnum (;641;) (param $var0 (ref $BuilderInfo_214)) (param $var1 i64) (param $var2 i64) (result (ref null $Enum))
  (local $var3 (ref null $FieldInfo_223))
  local.get $var0
  struct.get $BuilderInfo_214 $field4
  i32.const 71
  local.get $var1
  struct.new $BoxedInt
  call $_DefaultMap&_HashFieldBase&MapMixin&_HashBase&_OperatorEqualsAndHashCode&_LinkedHashMapMixin.[]
  ref.cast null $FieldInfo_223
  local.tee $var3
  ref.is_null
  if (result (ref null $#Closure-0-1_815))
    ref.null none
  else
    local.get $var3
    struct.get $FieldInfo_223 $field9
  end
  ref.is_null
  i32.eqz
  if
    local.get $var2
    call $Enum.valueOf
    return
  end
  ref.null none
)

Note that this calls $Enum.valueOf even though this function is generic on the enum type.

With this PR we add another enum to the proto file and decode it in setup, so that TFA is unable to specialize _deocdeEnum to one specific enum type.

New code:

(func $BuilderInfo._decodeEnum (;643;) (param $var0 (ref $BuilderInfo_214)) (param $var1 i64) (param $var2 i64) (result (ref null $ProtobufEnum))
  (local $var3 (ref null $FieldInfo_229))
  (local $var4 (ref null $#Closure-0-1))
  (local $var5 (ref $#Closure-0-1))
  local.get $var0
  struct.get $BuilderInfo_214 $field4
  i32.const 71
  local.get $var1
  struct.new $BoxedInt
  call $_DefaultMap&_HashFieldBase&MapMixin&_HashBase&_OperatorEqualsAndHashCode&_LinkedHashMapMixin.[]
  ref.cast null $FieldInfo_229
  local.tee $var3
  ref.is_null
  if (result (ref null $#Closure-0-1))
    ref.null none
  else
    local.get $var3
    struct.get $FieldInfo_229 $field9
  end
  local.tee $var4
  ref.is_null
  i32.eqz
  if
    local.get $var4
    ref.as_non_null
    local.tee $var5
    struct.get $#Closure-0-1 $field2
    i32.const 71
    local.get $var2
    struct.new $BoxedInt
    local.get $var5
    struct.get $#Closure-0-1 $field3
    struct.get $#Vtable-0-1 $field1
    call_ref $type39
    ref.cast null $ProtobufEnum
    return
  end
  ref.null none
)

Wasm benchmark results:

// Before
protobuf_PackedEnumDecoding(RunTimeRaw): 41120.0 us.

// After
protobuf_PackedEnumDecoding(RunTimeRaw): 52750.0 us.

VM benchmark results:

// Before
protobuf_PackedEnumDecoding(RunTimeRaw): 45051.520000000004 us.

// After
protobuf_PackedEnumDecoding(RunTimeRaw): 54661.125 us.

osa1 added 3 commits May 7, 2025 10:28
While investigating why decoding packed enums is so much slower in AOT
compared to decoding packed int32 (both are varints on the wire) I
noticed that the enum decoding benchmark should actually be slower,
because currently TFA is able to specialize the enum int value to Dart
enum value mapping to a direct call, in this function:

```
ProtobufEnum? _decodeEnum(
    int tagNumber, ExtensionRegistry? registry, int rawValue) {
  final f = valueOfFunc(tagNumber);
  if (f != null) {
    return f(rawValue);  // <------------------- HERE
  }
  ...
}
```

Wasm code for this function, before this PR:

```
(func $BuilderInfo._decodeEnum (;641;) (param $var0 (ref $BuilderInfo_214)) (param $var1 i64) (param $var2 i64) (result (ref null $Enum))
  (local $var3 (ref null $FieldInfo_223))
  local.get $var0
  struct.get $BuilderInfo_214 $field4
  i32.const 71
  local.get $var1
  struct.new $BoxedInt
  call $_DefaultMap&_HashFieldBase&MapMixin&_HashBase&_OperatorEqualsAndHashCode&_LinkedHashMapMixin.[]
  ref.cast null $FieldInfo_223
  local.tee $var3
  ref.is_null
  if (result (ref null $#Closure-0-1_815))
    ref.null none
  else
    local.get $var3
    struct.get $FieldInfo_223 $field9
  end
  ref.is_null
  i32.eqz
  if
    local.get $var2
    call $Enum.valueOf
    return
  end
  ref.null none
)
```

Note that this calls `$Enum.valueOf` even though this function is
generic on the enum type.

With this PR we add another enum to the proto file and decode it in
setup, so that TFA is unable to specialize `_deocdeEnum` to one specific
enum type.

New code:

```
(func $BuilderInfo._decodeEnum (;643;) (param $var0 (ref $BuilderInfo_214)) (param $var1 i64) (param $var2 i64) (result (ref null $ProtobufEnum))
  (local $var3 (ref null $FieldInfo_229))
  (local $var4 (ref null $#Closure-0-1))
  (local $var5 (ref $#Closure-0-1))
  local.get $var0
  struct.get $BuilderInfo_214 $field4
  i32.const 71
  local.get $var1
  struct.new $BoxedInt
  call $_DefaultMap&_HashFieldBase&MapMixin&_HashBase&_OperatorEqualsAndHashCode&_LinkedHashMapMixin.[]
  ref.cast null $FieldInfo_229
  local.tee $var3
  ref.is_null
  if (result (ref null $#Closure-0-1))
    ref.null none
  else
    local.get $var3
    struct.get $FieldInfo_229 $field9
  end
  local.tee $var4
  ref.is_null
  i32.eqz
  if
    local.get $var4
    ref.as_non_null
    local.tee $var5
    struct.get $#Closure-0-1 $field2
    i32.const 71
    local.get $var2
    struct.new $BoxedInt
    local.get $var5
    struct.get $#Closure-0-1 $field3
    struct.get $#Vtable-0-1 $field1
    call_ref $type39
    ref.cast null $ProtobufEnum
    return
  end
  ref.null none
)
```

Wasm benchmark results:

```
// Before
protobuf_PackedEnumDecoding(RunTimeRaw): 41120.0 us.

// After
protobuf_PackedEnumDecoding(RunTimeRaw): 52750.0 us.
```

VM benchmark results:

```
// Before
protobuf_PackedEnumDecoding(RunTimeRaw): 45051.520000000004 us.

// After
protobuf_PackedEnumDecoding(RunTimeRaw): 54661.125 us.
```
@osa1 osa1 requested a review from mkustermann May 7, 2025 09:40
@osa1 osa1 merged commit de6bcc2 into google:master May 7, 2025
17 checks passed
@osa1 osa1 deleted the fix_packed_enum_decoding_benchmark branch May 7, 2025 11:09
copybara-service bot pushed a commit to dart-lang/sdk that referenced this pull request May 12, 2025
Revisions updated by `dart tools/rev_sdk_deps.dart`.

dartdoc (https://github.com/dart-lang/dartdoc/compare/e4f9451..95f4208):
  95f4208e  2025-05-12  Sam Rawlins  Simplify Inheritable.computeCanonicalEnclosingContainer. (dart-lang/dartdoc#4047)

http (https://github.com/dart-lang/http/compare/78d6114..e70a41b):
  e70a41b  2025-05-09  Brian Quinlan  Clarify that some headers may not be sent/received (dart-lang/http#1768)
  d99cc3c  2025-05-07  Brian Quinlan  Return a customized `StreamedResponse` from `CronetClient.send` (dart-lang/http#1769)
  6b92d99  2025-05-07  JohnettJobben1  [web_socket_channel] Shorten library description for pub score improvement (dart-lang/http#1737)
  31da355  2025-05-05  Brian Quinlan  Prepare cronet_http/cupertino_http/http/web_socket for release (dart-lang/http#1767)
  dfbe73d  2025-05-05  Brian Quinlan  Ignore received data after the response stream has been closed (dart-lang/http#1766)

protobuf (https://github.com/dart-lang/protobuf/compare/1aaa332..7d2e615):
  7d2e615  2025-05-12  Agam Agarwal  Add fromDart() and toDart() methods to convert between $core.Duration and proto Duration (google/protobuf.dart#986)
  e4fca16  2025-05-12  Ömer Sinan Ağacan  Add sparse enum decoding benchmarks (google/protobuf.dart#984)
  006d3aa  2025-05-09  Ömer Sinan Ağacan  Update protoc_plugin pre-generated protos (google/protobuf.dart#982)
  4abee01  2025-05-09  Ömer Sinan Ağacan  Sort input proto files before processing (google/protobuf.dart#983)
  60e23f1  2025-05-07  Ömer Sinan Ağacan  Fix factory argument types for map fields (google/protobuf.dart#976)
  de6bcc2  2025-05-07  Ömer Sinan Ağacan  Fix packed enum decoding benchmark (google/protobuf.dart#979)
  9daf5ca  2025-05-06  Ömer Sinan Ağacan  Improve packed field decoding (google/protobuf.dart#959)

tools (https://github.com/dart-lang/tools/compare/92f10a9..36f5c9f):
  36f5c9f9  2025-05-05  Jacob MacDonald  broaden the publish tag regex to allow digits (dart-lang/tools#2085)
  c6a10613  2025-05-05  Goddchen  fix(clock): keep micros in monthsAgo, monthsFromNow and yearsAgo (dart-lang/tools#1202)
  f1f8ac18  2025-05-02  Liam Appelbe  [coverage] Fix another flaky lifecycle management error (dart-lang/tools#2082)

webdev (https://github.com/dart-lang/webdev/compare/5bf833d..1ea8462):
  1ea84624  2025-05-08  Nicholas Shahan  [gardening] Temporarily skip failing test case (dart-lang/webdev#2618)

Change-Id: I193c34b97e7acf1cf52c91240765344e47424b73
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/428100
Commit-Queue: Konstantin Shcheglov <[email protected]>
Auto-Submit: Devon Carew <[email protected]>
Reviewed-by: Konstantin Shcheglov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants