-
Notifications
You must be signed in to change notification settings - Fork 6k
Collapse bounds calculations into DisplayListBuilder #34365
Collapse bounds calculations into DisplayListBuilder #34365
Conversation
|
It looks like this pull request may not have tests. Please make sure to add tests before merging. If you need an exemption to this rule, contact Hixie on the #hackers channel in Chat (don't just cc him here, he won't see it! He's on Discord!). If you are not sure if you need tests, consider this rule of thumb: the purpose of a test is to make sure someone doesn't accidentally revert the fix. Ask yourself, is there anything in your PR that you feel it is important we not accidentally revert back to how it was before your fix? Reviewers: Read the Tree Hygiene page and make sure this patch meets those guidelines before LGTMing. |
|
I should clarify in the original issue. The concept here was not to just eliminate an extra Dispatch process, but to integrate the 2 mechanisms so that they could share information when needed. I've updated the original bug description to explain more of the inefficiencies that we want to avoid with this effort. |
flar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marking this as "Request Changes" as I don't think it is useful to just aggressively call the bounds accumulation methods without at least some integration that can lead to solving the problems identified in the original issue.
|
Unfortunately we're going to need the Calculator class for just a little while longer while we solve a pressing need to get more coverage for Impeller. The PlatformViews will need to move over to DisplayList and to do that they need to produce an RTree from a DisplayList. To get that work done quickly I reused the Calculator class but factored out the BoundsAccumulator into Rect and RTree subclasses. By default, a DisplayList will only produce the regular bounds, but a caller can ask it to produce an RTree. I hope to have the PR submitted by tonight or tomorrow, but am running into problems with code sharing between flow and display_list that I'll likely solve by just cut/pasting code. Once that is in we'll have to figure out how we can inline and make a good hybrid implementation. We could potentially just RTree every DisplayList, but I'd like to see the overhead of that. It would still plug in just as the current stuff does in this PR since the RTree is a small variant on the existing BoundsCalculator, but the question will be whether we try to do both in the inlining or not. Another idea would be for the accumulator to save the bounds of each primitive as it goes and then making an RTree would be a simple act of handing the entire list to the RTree generator without having to dispatch. But that will be a slight change to how the existing BoundsAccumulator works anyway. I'll get to work on creating this RTree PR so you can see better what I'm talking about. The goal is to have PlatformViews working on Impeller within the next week or so, so it is fairly high priority work and hopefully doesn't really modify what has happened in this PR too much (the Calculator dispatch methods are largely untouched and both variants of Accumulator still support accumulate(rect)). |
|
I tagged you in #34809 so that we can plan the more forward on how to integrate bounds calculation while also considering the need for an RTree. |
Very excited to see RTree for DisplayList! I thought about it again, I think a better solution is :
|
443cace to
a793e0d
Compare
That's what I was thinking with my comment about saving the bounds in my earlier comment. The pros are:
Cons:
|
|
@flar
|
|
Those are all excellent questions and the more I look at what is being done in both cases I realize that we might not have a good answer for either. Let me compose my thoughts a bit more before I try to come up with a good description of the concerns, but mainly the issues are:
|
|
I haven't read the latest commits in detail, but it looks like you went the route of saving all of the rectangles during construction and then lazily turning them into an RTree when needed (without a second Dispatch cycle). Did you measure the time that costs compared to:
It would be nice if "DisplayList + realized bounds (but not realized RTree object)" was similar in performance to the previous "DisplayList construction + lazy bounds calculation" and even better if it was even faster than that - though I don't imagine it would be as fast as the prior "construction without bounds" - but it would be nice to know what the performance difference was even though we almost always ask for the bounds. One way to test this would be to capture some complicated real-app scenes using the built-in flattening code that is used to turn the entire tree into a single DL (with nested DLs), and then turn that into a benchmark that rebuilds it under the different scenarios above. Or, we could just come up with a synthetic scene, but it would be nice if it reflected a variety of real-life app scenes. Note that having such a benchmark would be good for us moving forward so that we could measure the impact of each of our changes to DL. We've gone long enough based on the mysterious art of "I don't think what I just modified will cost us much time in DL construction" and it is time we started measuring these things more explicitly. |
|
I could see a DL construction benchmark as either something driven from Dart or something driven from the native APIs directly. There are pros and cons for either: Pros for Dart:
Pros for native:
|
|
Created flutter/flutter#108265 to track what might turn into a suite of benchmarks for DL (for this PR it would be nice to at least have construction/bounds/rtree benchmarks). |
Creating a benchmark is indeed a great idea! But I'm a little confused about how to create a scene for testing. As far as I know, display list does not support serialization,flutter/flutter#86715. So before implementing benchmarks, should we support serialization of display lists? Another way I can find is that we can convert the layer tree to a |
Yes, serialization would make it very convenient. The problem with skp is that there are a lot of attributes we can't retrieve from an Sk stream. Shadows are out, most Filters are opaque, etc. |
Actually, the biggest bang for the buck would be to utilize the data currently in display_list_unittests.cc which has a big array of arrays of "every flavor of every DL call" in them - mainly used for testing equality, predictability of byte and op counts, etc. But, these arrays would also be great input for a benchmark - lots of DisplayList calls all set up without any overhead for parsing. Just start a timer, create a DL, fill it with every variant of every op, build it, and check the time. |
|
@flar This PR is ready for review, please take a look when you have a chance :) |
|
2 things... When generating a DL from Dart, the old code used to supply the parameter to SkPicture/Recorder that asked it to generate an RTree, so hopefully when you add that flag, you are setting it in the code that sets up the builder in picture.cc/picture_recorder.cc That code should probably say "if accumulator is RTree AND DL has an RTree, then do the individual rects, otherwise just do the bounds. If the Builders are created via some other mechanism, then perhaps we need more locations that set the flag in the new scenario. Or ... should we have the default be true? |
|
Look at the difference between the initialization of the PictureRecorder vs the DLBuilder/Recorder in this PR that removed the old SkP code. Look at the lines in picture_recorder.cc - the DL side of the if statement just created a basic DLRecorder, but the Sk side of it passed an rtree_factory_ in which induced the SkPicture to create an rtree by default. |
I see, thanks!
I don't think we need to do this. After all, generating rtree has an impact on performance. Developers should set whether to generate rtree explicitly. |
|
Two other places to look for where the flag may be needed are when Impeller creates a DLRecorder to save the tree and when embedders decide whether to create an SkCanvas or DLRecorder for their overlays. See: Line 14 in 9872cc7
vs Line 43 in 9872cc7
and I think this is the line where we create a DLRecorder for Impeller to store the tree into: Line 31 in 9872cc7
|
The flag has been added when embedders decide whether to create an SkCanvas or DLRecorder for their overlays. But I'm not sure if the flag is needed or not when Impeller creates a DLRecorder to save the tree. At present, rtree is not used in this part of the logic, it seems that there is no need to add it for the time being. |
I'm a little foggy on how the main surface canvas is used with embedders. I know the overlays get diced up when rendering over the platform views so they need an RTree for performance, but the main canvas probably does not get sliced up by clips (it will have only part of the tree rendered into it to be sure, but I don't think it will end up being clipped.) One thing, though, is that a layer_tree might contain layers that are outside the clip while scrolling. The tree culling in ContainerLayer::Paint() will probably remove items that are completely outside of the clip, but I'm not sure if it will remove the ones that are partially in view or not or how that will affect performance if we don't have an RTree for them. I guess, since the SkCanvas used there is an actual "talking to a Canvas" SkCanvas, it never benefited from an RTree anyway, so we don't need the flag there. Also, one of the tasks on my plate is to replace DLRecorder/Builder with DlCanvas which can wrap either an SkCanvas or a DlBuilder. Once that is done we can consider a variant that wraps a Dispatcher and directly dispatches to it, removing the need to create a DL to render with Impeller. At that point, whether the DlBuilder for Impeller asks for an RTree will be moot. So, I guess I talked myself into agreeing with your assessment that the DlRecorder set up for Impeller won't need to set the flag. |
flar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some very minor nits. Perhaps no changes are needed, but I wanted to get feedback on the issues I raised...
display_list/display_list.cc
Outdated
| nested_op_count_(nested_op_count), | ||
| bounds_({0, 0, -1, -1}), | ||
| bounds_(bounds), | ||
| bounds_cull_(cull_rect), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that bounds_cull was only ever used as a max bounds for the accumulators so the field is likely obsolete now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| accumulator()->restore(); | ||
| } | ||
|
|
||
| if (is_unbounded) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the things that I can't figure out is that in the original DLBoundsCalculator, there is a note around here that says to restore the accumulator before popping the layer_info, but in this integration, the layer info is popped up top. I wish I could remember why that comment is there, but I don't really see anything that might go wrong here...???
I think it was because the Calculator was using a vector of unique pointers to layer infos and so the pointer would go out of scope, but that doesn't apply to the current code since Builder copies out the info. (Would it be faster to avoid the copy and use a reference and move the pop_back until after the code is done using the reference?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the previous code, DisplayListBoundsCalculator::restore() will first call SkMatrixDispatchHelper::restore() and ClipBoundsDispatchHelper::restore(), then call the restore of the accumulator, and finally pop layer_info.
The restore of the accumulator depends on the correct clip_bounds, that is, it must be called after ClipBoundsDispatchHelper::restore().
In DPBuilder, clip_bounds is saved in layer_info, so we have to pop layer_info first.
| checkForDeferredSave(); | ||
| Push<TranslateOp>(0, 1, tx, ty); | ||
| current_layer_->matrix.preTranslate(tx, ty); | ||
| current_layer_->matrix().preTranslate(tx, ty); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I don't like about accessing these matrices through methods is that this isn't clear that it is operating on the actual object. You have to double check that the method is returning a reference, which makes this code a little more confusing. Otherwise this code looks like it might be operating on a temporary object.
Returning const references is less confusing, but can also cause confusion if the member field is not const and can change out from under the caller.
I have a follow-on Github issue to use MatricClipTracker from Builder that would make most of this moot, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me, I'll file a new PR to fix the issue.
flar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…117148) * df430c4fd Revert "[Impeller] Speculatively attempt to fix Metal PSO construction errors on host targets." (flutter/engine#38292) * 41dd4f5e1 Revert "Revert "[Impeller] Speculatively attempt to fix Metal PSO construction errors on host targets." (#38292)" (flutter/engine#38301) * 8ce9a3f71 Roll Skia from 3171deabd88a to b368746d696a (13 revisions) (flutter/engine#38294) * 4881fe25e Revert "Revert "reland "Migrate darwin common "framework_shared" target to ARC #37049" (#37219)" (#37320)" (flutter/engine#37883) * 3b2302c8d [Impeller] Remove validation log when the pipeline library is collected before pipeline is setup. (flutter/engine#38306) * a04997c81 [Impeller] Disable impeller_unittests. (flutter/engine#38307) * fc71faad0 License script improvements (flutter/engine#38148) * 6a2560c35 [Windows] Synthesize modifier keys events on pointer events (flutter/engine#38138) * b1d407563 Roll Skia from b368746d696a to 3f81f95176ce (11 revisions) (flutter/engine#38312) * b25fcf748 Roll Skia from 3f81f95176ce to 46e8f2a18a3d (3 revisions) (flutter/engine#38314) * 948699bba Collapse bounds calculations into DisplayListBuilder (flutter/engine#34365) * 38367de84 Roll Fuchsia Mac SDK from u-tC0QEGUT4xQ4KOo... to VEOIaacOA75U7PYyz... (flutter/engine#38316) * 29196519c Roll Skia from 46e8f2a18a3d to 9f728d78f10d (1 revision) (flutter/engine#38317)
|
The results are in. The kDefault benchmarks took a hit as expected, but the kBounds/Rtree values are all equal or improved... |
* Collapse bounds calculations into DisplayListBuilder * Tweak code * Remove bounds cull * Remove obsolete comment
* Collapse bounds calculations into DisplayListBuilder * Tweak code * Remove bounds cull * Remove obsolete comment
…lutter#117148) * df430c4fd Revert "[Impeller] Speculatively attempt to fix Metal PSO construction errors on host targets." (flutter/engine#38292) * 41dd4f5e1 Revert "Revert "[Impeller] Speculatively attempt to fix Metal PSO construction errors on host targets." (flutter#38292)" (flutter/engine#38301) * 8ce9a3f71 Roll Skia from 3171deabd88a to b368746d696a (13 revisions) (flutter/engine#38294) * 4881fe25e Revert "Revert "reland "Migrate darwin common "framework_shared" target to ARC flutter#37049" (flutter#37219)" (flutter#37320)" (flutter/engine#37883) * 3b2302c8d [Impeller] Remove validation log when the pipeline library is collected before pipeline is setup. (flutter/engine#38306) * a04997c81 [Impeller] Disable impeller_unittests. (flutter/engine#38307) * fc71faad0 License script improvements (flutter/engine#38148) * 6a2560c35 [Windows] Synthesize modifier keys events on pointer events (flutter/engine#38138) * b1d407563 Roll Skia from b368746d696a to 3f81f95176ce (11 revisions) (flutter/engine#38312) * b25fcf748 Roll Skia from 3f81f95176ce to 46e8f2a18a3d (3 revisions) (flutter/engine#38314) * 948699bba Collapse bounds calculations into DisplayListBuilder (flutter/engine#34365) * 38367de84 Roll Fuchsia Mac SDK from u-tC0QEGUT4xQ4KOo... to VEOIaacOA75U7PYyz... (flutter/engine#38316) * 29196519c Roll Skia from 46e8f2a18a3d to 9f728d78f10d (1 revision) (flutter/engine#38317)
fix flutter/flutter#100172
This PR does not change the API and semantics, the tests already exist
engine/display_list/display_list_unittests.cc
Line 1104 in 31afa38
Pre-launch Checklist
writing and running engine tests.
///).