Create benchmarks for DisplayListBuilder #34910

ColdPaleLight · 2022-07-26T12:47:40Z

benchmarks for display list builder/display list.
fixes flutter/flutter#108265

In the process of writing the benchmark, I found that generating the rtree itself is not expensive. However, the call matrix().mapRect(&bounds) is very expensive.

Also I think we should use the second set of data (main + PR #34835) as the base data, because its function is correct.

cc @flar

1. main

Run on (10 X 2400 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB (x10)
  L1 Instruction 128 KiB (x10)
  L2 Unified 4096 KiB (x5)
Load Average: 3.48, 3.72, 4.67
***WARNING*** Library was built as DEBUG. Timings may be affected.
----------------------------------------------------------------------------------------------------------
Benchmark                                                                Time             CPU   Iterations
----------------------------------------------------------------------------------------------------------
BM_DisplayListBuiderDefault/kDefault                                 0.023 ms        0.023 ms        29784
BM_DisplayListBuiderDefault/kBounds                                  0.102 ms        0.101 ms         6934
BM_DisplayListBuiderDefault/kRtree                                   0.183 ms        0.182 ms         3822
BM_DisplayListBuiderDefault/kBoundsAndRtree                          0.253 ms        0.252 ms         2685
BM_DisplayListBuiderWithScaleAndTranslate/kDefault                   0.024 ms        0.024 ms        29750
BM_DisplayListBuiderWithScaleAndTranslate/kBounds                    0.109 ms        0.109 ms         6391
BM_DisplayListBuiderWithScaleAndTranslate/kRtree                     0.188 ms        0.188 ms         3727
BM_DisplayListBuiderWithScaleAndTranslate/kBoundsAndRtree            0.271 ms        0.271 ms         2568
BM_DisplayListBuiderWithPerspective/kDefault                         0.025 ms        0.025 ms        28018
BM_DisplayListBuiderWithPerspective/kBounds                          0.737 ms        0.736 ms          909
BM_DisplayListBuiderWithPerspective/kRtree                           0.814 ms        0.814 ms          856
BM_DisplayListBuiderWithPerspective/kBoundsAndRtree                   1.52 ms         1.52 ms          457
BM_DisplayListBuiderWithClipRect/kDefault                            0.023 ms        0.023 ms        30626
BM_DisplayListBuiderWithClipRect/kBounds                             0.099 ms        0.099 ms         7006
BM_DisplayListBuiderWithClipRect/kRtree                              0.178 ms        0.178 ms         3783
BM_DisplayListBuiderWithClipRect/kBoundsAndRtree                     0.253 ms        0.253 ms         2777
BM_DisplayListBuiderWithSaveLayer/kDefault                           0.049 ms        0.049 ms        13875
BM_DisplayListBuiderWithSaveLayer/kBounds                            0.201 ms        0.201 ms         3491
BM_DisplayListBuiderWithSaveLayer/kRtree                             0.277 ms        0.276 ms         2561
BM_DisplayListBuiderWithSaveLayer/kBoundsAndRtree                    0.422 ms        0.422 ms         1656
BM_DisplayListBuiderWithSaveLayerAndImageFilter/kDefault             0.073 ms        0.072 ms         9593
BM_DisplayListBuiderWithSaveLayerAndImageFilter/kBounds              0.359 ms        0.358 ms         1950
BM_DisplayListBuiderWithSaveLayerAndImageFilter/kRtree               0.439 ms        0.438 ms         1617
BM_DisplayListBuiderWithSaveLayerAndImageFilter/kBoundsAndRtree      0.718 ms        0.716 ms          968

2. main + PR #34365

Run on (10 X 2400 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB (x10)
  L1 Instruction 128 KiB (x10)
  L2 Unified 4096 KiB (x5)
Load Average: 4.76, 6.12, 7.81
***WARNING*** Library was built as DEBUG. Timings may be affected.
----------------------------------------------------------------------------------------------------------
Benchmark                                                                Time             CPU   Iterations
----------------------------------------------------------------------------------------------------------
BM_DisplayListBuiderDefault/kDefault                                 0.114 ms        0.114 ms         6239
BM_DisplayListBuiderDefault/kBounds                                  0.115 ms        0.114 ms         6253
BM_DisplayListBuiderDefault/kRtree                                   0.188 ms        0.187 ms         3786
BM_DisplayListBuiderDefault/kBoundsAndRtree                          0.187 ms        0.187 ms         3721
BM_DisplayListBuiderWithScaleAndTranslate/kDefault                   0.129 ms        0.129 ms         5418
BM_DisplayListBuiderWithScaleAndTranslate/kBounds                    0.128 ms        0.128 ms         5332
BM_DisplayListBuiderWithScaleAndTranslate/kRtree                     0.200 ms        0.200 ms         3427
BM_DisplayListBuiderWithScaleAndTranslate/kBoundsAndRtree            0.199 ms        0.199 ms         3457
BM_DisplayListBuiderWithPerspective/kDefault                         0.781 ms        0.780 ms          910
BM_DisplayListBuiderWithPerspective/kBounds                          0.782 ms        0.780 ms          909
BM_DisplayListBuiderWithPerspective/kRtree                           0.847 ms        0.846 ms          828
BM_DisplayListBuiderWithPerspective/kBoundsAndRtree                  0.844 ms        0.843 ms          803
BM_DisplayListBuiderWithClipRect/kDefault                            0.115 ms        0.115 ms         5987
BM_DisplayListBuiderWithClipRect/kBounds                             0.115 ms        0.115 ms         6128
BM_DisplayListBuiderWithClipRect/kRtree                              0.188 ms        0.187 ms         3788
BM_DisplayListBuiderWithClipRect/kBoundsAndRtree                     0.188 ms        0.187 ms         3772
BM_DisplayListBuiderWithSaveLayer/kDefault                           0.191 ms        0.191 ms         3735
BM_DisplayListBuiderWithSaveLayer/kBounds                            0.191 ms        0.190 ms         3714
BM_DisplayListBuiderWithSaveLayer/kRtree                             0.263 ms        0.263 ms         2662
BM_DisplayListBuiderWithSaveLayer/kBoundsAndRtree                    0.263 ms        0.262 ms         2687
BM_DisplayListBuiderWithSaveLayerAndImageFilter/kDefault             0.467 ms        0.467 ms         1529
BM_DisplayListBuiderWithSaveLayerAndImageFilter/kBounds              0.465 ms        0.464 ms         1512
BM_DisplayListBuiderWithSaveLayerAndImageFilter/kRtree               0.542 ms        0.541 ms         1303
BM_DisplayListBuiderWithSaveLayerAndImageFilter/kBoundsAndRtree      0.538 ms        0.537 ms         1307

Pre-launch Checklist

I read the [Contributor Guide] and followed the process outlined there for submitting PRs.
I read the [Tree Hygiene] wiki page, which explains my responsibilities.
I read and followed the [Flutter Style Guide] and the [C++, Objective-C, Java style guides].
I listed at least one issue that this PR fixes in the description above.
I added new tests to check the change I am making or feature I am adding, or Hixie said the PR is test-exempt. See [testing the engine] for instructions on
writing and running engine tests.
I updated/added relevant documentation (doc comments with ///).
I signed the [CLA].
All existing and new tests are passing.

flar · 2022-07-26T20:22:28Z

I think we need to tailor how we apply all of the bits and pieces. Running all variants of drawRect/drawAtlas/etc. is probably fine, but on each build out we run all of the transform, all of the clip, and all of the saveLayer calls together. Plus, we have pretty much all of the attributes set every which way. I think a good reason that the mapRect call was having such an impact is that we probably left the DL with a nasty perspective matrix for all the rest of the ops to be built on, but perspective is such a corner case in practice.

One technique might be to break out the big array into smaller groups and then we can mix and match then in more controlled ways:

One vector for all of the rendering ops
One vector for transform variations
One vector for clip variations
Perhaps different versions of the transform/clip/attribute/save vectors for benchmarking just a few options vs the full testing done for the correctness testing.
"allGroups" (maybe with a better name) would then be a concatenation of the "every variation" vectors in each category.
benchmarking would mainly work with "allRenderingOps" vector and then have a few variations of the other state to combine that vector with

We then could have (the following is a full list, but we might want to start with a subset, see below):

time for unmodified render ops construction
time for render ops under (?translate?), scale, (?rotate?), perspective matrices
time for render ops under clipRect/(?RR/Path?)
with and without saveLayer/restore
perhaps with non-Layer-save/restore sprinkled about ridiculously (we may want to optimize those out eventually)
time for render ops with a couple of stroke variations (PathEffect very rarely used, not worth benchmarking?)
time for render ops with an ImageFilter (most typically a Blur, others are rarely used)
possibly combine some of those latter categories with a simple scale, but not a whole suite of transform types
(any other attributes that will complicate construction?)

For starters I'd like to see a simple small number of benchmarks with the most common variations:

identity (most ui.Picture objects don't have a DPI scale built in)
scale/translate (most LayerTrees have a DPI scale at the root level and then a translate here and there for layout)
(maybe perspective for shock value)
clipRect (I think clipRRect is also used in a few cases, but probably doesn't add much time)
(bunch of ops) then saveLayer then (bunch of ops) then restore
saveLayer suite as above, but with blur image filter on saveLayer

I'm not sure what to do about attributes for now as I do want to know how long they take, but I don't want to have lots of random settings through most of the benchmarks. The reason they should be benchmarked is that they can do deep copies of complex attributes and it would be good to know how much that costs us in the long run, but I don't think that is urgent at this time, something we can add over time.

ColdPaleLight · 2022-07-27T09:47:26Z

@flar
I tweak the code as you suggested and updated the benchmarks data on the top.

flar · 2022-07-27T21:52:36Z

(Comment updated with new numbers on 7/29/22)

While the results graphed and discussed here represent considerations for #34365, they actually point out the value of the benchmarks being added in this PR.

It looks like the new code can construct DL+bounds (the most common case by far) between 10-20% slower than the previous code, possibly because of having to maintain all of the extra rectangles. DL + RTree is similarly a little slower but the gap is 5-10%.

Also, the timing with the ImageFilter saveLayer is much worse - 20-30%.

Where the new code shines, though, is in computing both bounds and RTree on a new DL which is quite a bit faster across the board - a 25-45% savings.

Graph of 2 pass vs 1 pass, construct + bounds

Graph of 2 pass vs 1 pass, construct + rtree

Graph of 2 pass vs 1 pass, construct + bounds + rtree

flar · 2022-07-27T22:04:37Z

The data originally graphed in this comment has been updated and the new graphs are now included in the previous comment.

Another thing to consider is that this didn't help with the case of constructing a DL and then computing an RTree without also computing Bounds. You didn't collect benchmark data for that case, but I inferred it using "result2 - result1 + result0" and came up with the following graph.

Graph of obsolete data, do not look ;)

The only target we currently have for computing an RTree will be when DL gets used to collect pieces of scenes that are in front of, behind, and between external embedded platform views. In those cases, I don't think we would compute the bounds, just the RTree.

~~But, future work might benefit from the fact that the trifecta of "construct + bounds + rtree" is faster now if we add fast-rtree-culling to the dispatch of a DisplayList.~~

flar · 2022-07-27T22:08:23Z

Those last 2 comments are really about the benefit of the work done in #34365

This PR, which simply adds the benchmark will be important regardless of that other work, so I'll review it with intent to land it as work progresses on the subsequent developments of DL/bounds/RTree...

display_list/display_list_builder_benchmarks.cc

display_list/display_list_test_utils.cc

ColdPaleLight · 2022-07-29T08:54:15Z

I updated the benchmarks data on the top again.

flar

Everything LGTM, autosubmit-ing now to start generating data.

flar · 2022-08-02T21:20:41Z

I was having trouble finding the results in Skia perf until I realized that the test name was misspelled:

https://flutter-engine-perf.skia.org/e/?queries=executable%3D._display_list_builder_benchmarks

Fix: #35104

Create benchmarks for DisplayListBuilder

8831949

ColdPaleLight requested a review from flar July 26, 2022 12:48

ColdPaleLight added 2 commits July 26, 2022 21:34

Update run_test.py

c25ee8b

format

fde075e

ColdPaleLight and others added 3 commits July 27, 2022 16:47

Tweak benchmarks

a1b7e6e

clean code

efb5302

Merge branch 'flutter:main' into display_list_benchmarks

1948610

flar suggested changes Jul 28, 2022

View reviewed changes

Tweak code

08c0aa0

ColdPaleLight requested a review from flar July 29, 2022 07:45

flar approved these changes Jul 29, 2022

View reviewed changes

flar added the autosubmit Merge PR when tree becomes green via auto submit App label Jul 29, 2022

auto-submit bot merged commit 56c6c18 into flutter:main Jul 29, 2022

flar mentioned this pull request Jul 29, 2022

Create benchmark for DisplayList operations flutter/flutter#108265

Closed

engine-flutter-autoroll mentioned this pull request Jul 30, 2022

Roll Flutter Engine from 8f1c93dec869 to 05ff95d4496f (10 revisions) flutter/flutter#108656

Merged

engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jul 30, 2022

56c6c18 Create benchmarks for DisplayListBuilder (flutter/engine#34910)

7229148

emilyabest pushed a commit to emilyabest/engine that referenced this pull request Aug 12, 2022

Create benchmarks for DisplayListBuilder (flutter#34910)

82f6846

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Create benchmarks for DisplayListBuilder #34910

Create benchmarks for DisplayListBuilder #34910

Uh oh!

ColdPaleLight commented Jul 26, 2022 •

edited by flar

Loading

Uh oh!

flar commented Jul 26, 2022

Uh oh!

ColdPaleLight commented Jul 27, 2022

Uh oh!

flar commented Jul 27, 2022 •

edited

Loading

Uh oh!

flar commented Jul 27, 2022 •

edited

Loading

Uh oh!

flar commented Jul 27, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ColdPaleLight commented Jul 29, 2022

Uh oh!

flar left a comment

Uh oh!

flar commented Aug 2, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Create benchmarks for DisplayListBuilder #34910

Create benchmarks for DisplayListBuilder #34910

Uh oh!

Conversation

ColdPaleLight commented Jul 26, 2022 • edited by flar Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. main

2. main + PR #34365

Pre-launch Checklist

Uh oh!

flar commented Jul 26, 2022

Uh oh!

ColdPaleLight commented Jul 27, 2022

Uh oh!

flar commented Jul 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

flar commented Jul 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

flar commented Jul 27, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ColdPaleLight commented Jul 29, 2022

Uh oh!

flar left a comment

Choose a reason for hiding this comment

Uh oh!

flar commented Aug 2, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ColdPaleLight commented Jul 26, 2022 •

edited by flar

Loading

flar commented Jul 27, 2022 •

edited

Loading

flar commented Jul 27, 2022 •

edited

Loading