Improve enum decoding #980

osa1 · 2025-05-07T14:57:20Z

Currently when generating the ProtobufEnum value for an enum value on the
wire, we call the function valueOfFunc of FieldInfo of the enum field,
which directly indexes the enum class's _byValue map.

With this PR we avoid the closure call by making _byValue public and calling
it directly.

Wasm benchmarks:

Before: protobuf_PackedEnumDecoding(RunTimeRaw): 48200.0 us
After: protobuf_PackedEnumDecoding(RunTimeRaw): 42120.0 us
Diff: -12.6%

AOT benchmarks:

Before: protobuf_PackedEnumDecoding(RunTimeRaw): 49180.0 us
After: protobuf_PackedEnumDecoding(RunTimeRaw): 45726.82 us
Diff: -7%

See also alternative PR: #981.

This is an alternative to google#980 that doesn't make a big difference in terms of performance of AOT compiled benchmarks, but makes a big difference when compiling to Wasm, comapred to google#980. When decoding an enum value, we call a callback in the enum field's `FieldInfo`. The callback then indexes a map mapping enum numbers to Dart values. When these conditions hold: - The known enum numbers are all positive. (so that we can use a list and index it with the number) - The known enum numbers are more than 70% of the large known enum number. (so that the list won't have a lot of `null` entries, wasting heap space) We now generate a list instead of a map to map enum numbers to Dart values. Note: similar to the map, the list is runtime allocated. No new code generated per message or enum type. Wasm benchmarks: - Before: `protobuf_PackedEnumDecoding(RunTimeRaw): 48200.0 us` - PR google#980: `protobuf_PackedEnumDecoding(RunTimeRaw): 42120.0 us` - Diff: -12.6% - After: `protobuf_PackedEnumDecoding(RunTimeRaw): 35733.3 us` - Diff against PR google#980: -15% - Diff against master: -25% AOT benchmarks: - Before: `protobuf_PackedEnumDecoding(RunTimeRaw): 49180.0 us` - PR google#980: `protobuf_PackedEnumDecoding(RunTimeRaw): 45726.82 us` - Diff: -7% - This PR: `protobuf_PackedEnumDecoding(RunTimeRaw): 42929.7 us` - Diff against PR google#980: -6% - Diff agianst master: -12%

mkustermann · 2025-05-08T10:57:54Z

protobuf/lib/src/protobuf/builder_info.dart

  // Enum.
  void e<T>(int tagNumber, String name, int fieldType,
      {dynamic defaultOrMaker,
-      ValueOfFunc? valueOf,


Aren't these changes removing those parameters from public APIs? They seem to be breaking changes. So old generated code will no longer work with newest package:protobuf. If package:protobuf has to upgrade to new major version then it causes trouble for ecosystem:

If an app depends transitively on A and B and A uses old and B uses new protobuf version, then we get constraint conflict.

Yes, it's a breaking change. You need to update protoc_plugin and protobuf together and re-generate your proto classes.

We've done it in the past, for example with the most recent protobuf and protoc_plugins.

If an app depends transitively on A and B and A uses old and B uses new protobuf version, then we get constraint conflict.

Yes, but you can't do any major bumps and breaking changes if you want to avoid this. We deal with it the same way we've dealt with it in the past.

So from your experience it doesn't cause too much churn on the ecosystem? Then it's ok, but you'll have to bump major version number.

So from your experience it doesn't cause too much churn on the ecosystem?

I don't have too much experience update downstream with major version bumps, but I expect this particular change to not be a big problem.

Maybe someone who works on updating dependencies can comment how much churn this kind of thing causes.

Internally, no one notice this change other than maybe their benchmarks getting faster.

Releasing a major version (w/ planning, batching breaking changes, planning deprecations, ...) should be just a normal part of the process. The diamond constraint issue becomes a problem when one of the packages in the cycle has become unmaintained (rev'ing their deps doesn't happen or happens w/ lots of latency). You're more likely to run into that when more things depend on the package being released - when it's further down the overall ecosystem stack.

For something like protobuf I'd just do regular planning for a new major version but would still rev. when necessary. Even for breaking changes, you want to minimize the work for customers - they will eventually need to rev from 3 => 4, or 4 => 5.

There are a few packages that are very core to the ecosystem, and you have to be cautious about rev'ing, or choose to not rev at all (like, the direct deps of flutter - https://github.com/flutter/flutter/blob/master/packages/flutter/pubspec.yaml#L8).

mkustermann · 2025-05-08T19:51:55Z

protobuf/lib/src/protobuf/builder_info.dart

  // Enum.
  void e<T>(int tagNumber, String name, int fieldType,
      {dynamic defaultOrMaker,
-      ValueOfFunc? valueOf,


So from your experience it doesn't cause too much churn on the ecosystem? Then it's ok, but you'll have to bump major version number.

protoc_plugin/lib/src/generated/descriptor.pb.dart

protobuf/lib/src/protobuf/builder_info.dart

mkustermann · 2025-05-08T20:07:25Z

protobuf/lib/src/protobuf/field_info.dart

      {dynamic defaultOrMaker,
      this.subBuilder,
-      this.valueOf,
+      this.enumValueMap,


Instead of getting this new parameter, why can't we take enumValues we already get and calculate this based on that? And if we can do that, we can decide whether we calculate a list or a map depending on density of the enum values, etc.

Instead of getting this new parameter, why can't we take enumValues we already get and calculate this based on that? And if we can do that, we can decide whether we calculate a list or a map depending on density of the enum values, etc.

We have one function to decode enum int values:

protobuf.dart/protobuf/lib/src/protobuf/builder_info.dart

Lines 324 to 334 in 006d3aa

ProtobufEnum? _decodeEnum(

int tagNumber, ExtensionRegistry? registry, int rawValue) {

final f = valueOfFunc(tagNumber);

if (f != null) {

return f(rawValue);

}

return registry

?.getExtension(qualifiedMessageName, tagNumber)

?.valueOf

?.call(rawValue);

}

For it to index a list sometimes and map other times, we would need a type test, or a closure, as in master branch and the alternative PR.

Re: generating the list or map from the enumValues argument: the map and list should be per-type, but runtime allocated. So it needs to be in a final static in the generated the enum classes. Example:

protobuf.dart/protoc_plugin/lib/src/generated/plugin.pbenum.dart

Lines 30 to 31 in 006d3aa

static final $core.Map<$core.int, CodeGeneratorResponse_Feature> _byValue =

$pb.ProtobufEnum.initByValue(values);

And then this static field needs to be passed to FieldInfo so that BuilderInfo can find it from the FieldInfo.

Does this make sense?

Makes sense.

But instead of passing the representation (e.g. Map<int, XX>, List<XX> in here) or a valueOf closure. One could pass an instance of a ProtobufEnumDescriptor in here that has a .decode() method.

This could be a direct call, it could have a List<T?> values and a bool (whether it's indexed by value or binary search). It would then lookup and return. So there wouldn't be indirect calls (neither closure calls nor dispatch table calls). We would also pass the integer argument unboxed (in closure call we pass it boxed).

Something like

// package:protobuf class ProtobufEnum {} class ProtobufEnumDescriptor { final bool _binarySearch; final List<ProtobufEnum?> _byValue; ProtobufEnumDescriptor(this._binarySearch, this._byValue); ProtobufEnum? lookup(int value) { ... } } // foo.pbenum.dart class FooEnum extends ProtobufEnum { ... static final info_ = ProtobufEnumDescriptor(values, useBinarySearch: <if sparse>); } // foo.pb.dart class Foo extends GeneratedMessage { final info_ = BuilderInfo() ..enumField(FooEnum.info_); }

Then the decoder would see

ProtobufEnum? _decodeEnum(int tagNumber, ExtensionRegistry? registry, int rawValue) { final f = enumDescriptorOf(tagNumber); if (f != null) { return f.lookup(rawValue); // direct call, passing `rawValue` unboxed, does array lookup or binary search } ...; }

Would that work?
If so, wouldn't that be the most efficient way to do it?

I think this would work and I agree that it should improve things.

Compared to the current PR, it's similar to passing a list (instead of a map) + a bool to whether the list should be binary searched or indexed.

A BuilderInfo.enumField wouldn't work. The enum infos need to be attached to field infos for repeated and map fields as well, so we need to pass the info to BuilderInfo.m (adds map field info) and BuilderInfo.addRepeated.

An class ProtobufEnumDescriptor would also mean a level of indirection when accessing the list and the boolean flag for whether to binary search. So perhaps passing an extra argument to the current methods for the bool flag would perform the best.

I'll first update the other PR with binary search because that's easy to do. Then update benchmarks to have some sparse enums/enums with large encodings. We can merge the benchmarks separately. Then revisit this idea.

osa1 · 2025-05-12T10:55:32Z

Closing this one as alternatives #981 and #985 are faster.

When decoding an enum value, we call a callback in the enum field's `FieldInfo`. The callback then indexes a map mapping enum numbers to Dart values. When these conditions hold: - The known enum numbers are all positive. (so that we can use a list and index it with the number) - The known enum numbers are more than 70% of the large known enum number. (so that the list won't have a lot of `null` entries, wasting heap space) We now generate a list instead of a map to map enum numbers to Dart values. Similar to the map, the list is runtime allocated. No new code generated per message or enum type. AOT benchmarks: - Before: `protobuf_PackedEnumDecoding(RunTimeRaw): 47585.14 us.` - After: `protobuf_PackedEnumDecoding(RunTimeRaw): 38974.566666666666 us.` - Diff: -18% Wasm benchmarks: - Before: `protobuf_PackedEnumDecoding(RunTimeRaw): 52225.0 us.` - After: `protobuf_PackedEnumDecoding(RunTimeRaw): 34283.33333333333 us.` - Diff: -34% **Alternatives considered:** - #980 uses a map always, but eliminates the `valueOf` closure. - #985 uses a list always, and does binary search in the list when the list is "shallow". - #987 is the same as #985, but instead of calling the `valueOf` closure it stores an extra field in `FieldInfo`s for whether to binary search or directly index. These are all slower than the current PR.

osa1 added 4 commits May 7, 2025 15:07

Remove closure calls when mapping enum nums to vals

78c88d9

Rename valueOfMap -> enumValueMap

6fe139a

Remove valueOf

0f51158

Fix warn

bd1ca9f

osa1 marked this pull request as draft May 8, 2025 08:10

osa1 mentioned this pull request May 8, 2025

Improve enum decoding -- alternative #981

Merged

osa1 requested a review from mkustermann May 8, 2025 09:57

osa1 mentioned this pull request May 8, 2025

Update protoc_plugin pre-generated protos #982

Merged

osa1 marked this pull request as ready for review May 8, 2025 10:08

mkustermann reviewed May 8, 2025

View reviewed changes

osa1 added 2 commits May 9, 2025 12:36

Merge remote-tracking branch 'origin/master' into improve_enum_decoding

b887c49

Merge remote-tracking branch 'origin/master' into improve_enum_decoding

b81767f

osa1 closed this May 12, 2025

osa1 deleted the improve_enum_decoding branch May 12, 2025 10:55

	ProtobufEnum? _decodeEnum(
	int tagNumber, ExtensionRegistry? registry, int rawValue) {
	final f = valueOfFunc(tagNumber);
	if (f != null) {
	return f(rawValue);
	}
	return registry
	?.getExtension(qualifiedMessageName, tagNumber)
	?.valueOf
	?.call(rawValue);
	}

	static final $core.Map<$core.int, CodeGeneratorResponse_Feature> _byValue =
	$pb.ProtobufEnum.initByValue(values);

Improve enum decoding #980

Improve enum decoding #980

Uh oh!

Conversation

osa1 commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

osa1 May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

osa1 commented May 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

osa1 commented May 7, 2025 •

edited

Loading

osa1 May 8, 2025 •

edited

Loading