-
Notifications
You must be signed in to change notification settings - Fork 6k
Only rasterize one layer tree per vsync frame #16175
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -144,6 +144,9 @@ class Pipeline : public fml::RefCountedThreadSafe<Pipeline<R>> { | |
|
|
||
| { | ||
| std::scoped_lock lock(queue_mutex_); | ||
| if (queue_.size() == 0) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder why
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We always had a 1:1 correlation between putting a LayerTree on the queue and taking it off, but I wanted to eliminate the queue and so now the task can be scheduled when there is something in the queue, but by the time it actually runs we might have eliminated something in the queue. When this fix goes beyond the WIP stage, I'll see if I can throttle the task posting so we only have at most one outstanding task and only when we have a waiting LayerTree to process, but it's a lot easier to just let the posted tasks go away on their own. Minimally, this might need to change to |
||
| return PipelineConsumeResult::NoneAvailable; | ||
| } | ||
| std::tie(resource, trace_id) = std::move(queue_.front()); | ||
| queue_.pop_front(); | ||
| items_count = queue_.size(); | ||
|
|
@@ -175,6 +178,10 @@ class Pipeline : public fml::RefCountedThreadSafe<Pipeline<R>> { | |
| void ProducerCommit(ResourcePtr resource, size_t trace_id) { | ||
| { | ||
| std::scoped_lock lock(queue_mutex_); | ||
| if (queue_.size() > 0) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this limiting the queue to only have size no more than 1? If it always has size no more than 1, why do we need a queue?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Correct. I don't see the need to have more than one LayerTree waiting for us to process and stacking them up in a queue just provides more ways to fall behind in the GPU thread. This is a minimal impact way of pretending we don't have a queue. A larger change to switch to just a pointer will happen if we believe that this approach is correct.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also need to find out how this impacts other teams that might be using our code to do more than just the buffering we do on iOS. In particular, I heard of a team that was thinking that their scrolling might get smoother if they switched to quadruple buffering. They may actually be happier with this form of throttling the queue, or not. If they really do want to quadruple buffer, then we might need to keep a queue and the throttling to a size of 1 might be iOS specific. I also haven't tested the impact of eliminating the queue on Android which seems less bothered by presenting more than one frame per vsync interval. Does this help or hurt on Android (rhetorical question for myself to investigate)? |
||
| FML_LOG(INFO) << "queue was already stocked"; | ||
| queue_.clear(); | ||
| } | ||
| queue_.emplace_back(std::move(resource), trace_id); | ||
| } | ||
|
|
||
|
|
@@ -185,9 +192,13 @@ class Pipeline : public fml::RefCountedThreadSafe<Pipeline<R>> { | |
| void ProducerCommitFront(ResourcePtr resource, size_t trace_id) { | ||
| { | ||
| std::scoped_lock lock(queue_mutex_); | ||
| queue_.emplace_front(std::move(resource), trace_id); | ||
| while (queue_.size() > depth_) { | ||
| queue_.pop_back(); | ||
| if (queue_.size() > 0) { | ||
| FML_LOG(INFO) << "ignoring CommitFront in favor of existing queue entry"; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any consequence or downside of throwing away the resource when the queue is nonempty?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is in the These two methods to add something to the queue could all be replaced by a single version of "ProduceCommit" that just compares the target times of the incoming LayerTree with the target time of the one that is already waiting and keeps only the newest version.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll add that the only source of "kResubmit" in the code I could find came from the iOS platform view code and looked to be related to a change in threading resulting from a change in the structure of a frame (due to a change in the embedding?)...
|
||
| } else { | ||
| queue_.emplace_front(std::move(resource), trace_id); | ||
| while (queue_.size() > depth_) { | ||
| queue_.pop_back(); | ||
| } | ||
| } | ||
| } | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -138,12 +138,12 @@ void Rasterizer::Draw(fml::RefPtr<Pipeline<flutter::LayerTree>> pipeline) { | |
| // between successive tries. | ||
| switch (consume_result) { | ||
| case PipelineConsumeResult::MoreAvailable: { | ||
| task_runners_.GetGPUTaskRunner()->PostTask( | ||
| task_runners_.GetGPUTaskRunner()->PostTaskForTime( | ||
| [weak_this = weak_factory_.GetWeakPtr(), pipeline]() { | ||
| if (weak_this) { | ||
| weak_this->Draw(pipeline); | ||
| } | ||
| }); | ||
| }, resubmitted_time_target_); | ||
| break; | ||
| } | ||
| default: | ||
|
|
@@ -231,6 +231,19 @@ sk_sp<SkImage> Rasterizer::ConvertToRasterImage(sk_sp<SkImage> image) { | |
| }); | ||
| } | ||
|
|
||
| fml::TimePoint Rasterizer::DetermineRasterizationTime(LayerTree& layer_tree) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this the time to start rasterization, and thus usually the time to finish build?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This implements the "only one per vsync interval" part of the solution. It is computing "when do we want to start this rasterization" based on whether we have already presented a LayerTree in this interval or not. If it is in the future, we will want to postpone our current task until this calculated time. |
||
| fml::TimePoint start = layer_tree.build_start(); | ||
| if (start < last_render_finish_) { | ||
| fml::TimePoint target = layer_tree.build_target(); | ||
| fml::TimePoint now = fml::TimePoint::Now(); | ||
| if (target > now) { | ||
| return target; | ||
| } | ||
| return now + (target - start); | ||
| } | ||
| return start; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't fully understand why this function needs to work in this way. Returning build start time as rasterization time looks strange.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I should add comments, but these computations may change over time under this WIP. If the GPU thread was directly informed of the latest reported vsync times from the vsync_waiter, then it could make better guesses as to when to postpone the current task. It doesn't get that information because sometimes the vsync is paused (when we get too far behind), and the info is delivered to another thread through a separate callback mechanism, so getting that information to the rasterizer involves a communication channel that I haven't identified (or created?) yet. For now, it will base its decision on the posted LayerTree that it is planning to handle. This particular computation can be beefed up more in the future to make better estimates. It doesn't fully prevent the condition in some cases, but it keeps us from entering the condition if we are only one frame behind, possibly more and it also keeps us from perpetuation the condition for more than a single frame when we are multiple frames behind. Here is the current heuristic... If we start processing a LayerTree and a new one gets posted while we are processing it, then the time we finished rendering the current one will be after the "begin" time of the new one. That seems to work pretty well to detect the condition. If our last rasterization end time is before the target time of this LayerTree then that is an ideal case where we are only 1 frame behind and we can schedule for the target time of that frame and that is the best estimate we have of when the next vsync interval starts. Note that because we will start processing the LayerTree exactly on the leading edge of the next vsync interval then we don't give the produce end of the pipeline any time to compute a more recent frame. Further on in this comment I talk about future changes to the production end that can help alleviate this lag, but for now this lag is no worse than when we get blocked in the kernel anyway. If we are later than even the target time of the LayerTree we are processing, then we need to pick some point in the future that is after the current vsync interval for which we have no direct start/target times. I could linearly extrapolate the vsync intervals from the historic information in the LayerTree I have at hand, but for now I chose "now+frame_duration" (as computed from the target-start times in the LayerTree. This could actually be better than computing the next real vsync time because it means when that vsync interval comes to happen, we might have a chance to quickly compute a new scene and post it before the GPU thread task wakes up - which will mean we render a more "recent" scene. If we start processing immediately at the next vsync time then we will be rendering in parallel with a scene build and we'll end up presenting an out of date scene just as a fresh scene becomes available - at which point we can't render that scene until the next interval. Eventually I want to instrument the code that fields a vsync notification and if it detects that we are already rasterizing a LayerTree, then have it either punt back and wait for the next vysnc notification to synch up, or to just go ahead and compute the next frame. I originally tried that strategy to see if it would help, but it didn't solve the problem. It might serve a role in the current "fix", though. To be clear, I am thinking of the following in the vsync callback: That code would compute "frame+1" and start that processing now rather than wait for the next vsync notification. This gives us headroom in case the build times are taking longer than a single frame to compute (which likely correlates with the rasterization times going long as well). Eventually we'd want some really sophisticated prediction algorithms that try to figure out "If I start computing a scene now, how long do I think it will take to do that, and how long do I think we'd take to render it, and what time do I actually think it will show up on the screen", but even the baby step I just outlined would be 1 better than our current system of always rendering based on the reported times. Also, such a heuristic calculation might get caught off guard if its predictions suddenly change - after all, who knows how much complexity got added or removed at the app level? Nothing in this code has that information. But, at least it can decide not to compute a scene that it can show won't get rendered until after the reported target time. Also, a more sophisticated prediction algorithm would come in 2 parts - the build part that tries to target the sum of "build + raster" timing, and then when the task is posted to the GPU runner it will have to compute when to start processing the frame so that it posts approximately when it was computed for. There is a lot of magic to consider, and be worried about, there... |
||
| } | ||
|
|
||
| RasterStatus Rasterizer::DoDraw( | ||
| std::unique_ptr<flutter::LayerTree> layer_tree) { | ||
| FML_DCHECK(task_runners_.GetGPUTaskRunner()->RunsTasksOnCurrentThread()); | ||
|
|
@@ -239,6 +252,14 @@ RasterStatus Rasterizer::DoDraw( | |
| return RasterStatus::kFailed; | ||
| } | ||
|
|
||
| fml::TimePoint when = DetermineRasterizationTime(*layer_tree); | ||
| if (when > layer_tree->build_start()) { | ||
| FML_LOG(INFO) << "Rescheduling"; | ||
| resubmitted_layer_tree_ = std::move(layer_tree); | ||
| resubmitted_time_target_ = when; | ||
| return RasterStatus::kResubmit; | ||
| } | ||
|
|
||
| FrameTiming timing; | ||
| timing.Set(FrameTiming::kBuildStart, layer_tree->build_start()); | ||
| timing.Set(FrameTiming::kBuildFinish, layer_tree->build_finish()); | ||
|
|
@@ -252,6 +273,7 @@ RasterStatus Rasterizer::DoDraw( | |
| last_layer_tree_ = std::move(layer_tree); | ||
| } else if (raster_status == RasterStatus::kResubmit) { | ||
| resubmitted_layer_tree_ = std::move(layer_tree); | ||
| resubmitted_time_target_ = fml::TimePoint::Now(); | ||
| return raster_status; | ||
| } | ||
|
|
||
|
|
@@ -265,7 +287,8 @@ RasterStatus Rasterizer::DoDraw( | |
| // TODO(liyuqian): in Fuchsia, the rasterization doesn't finish when | ||
| // Rasterizer::DoDraw finishes. Future work is needed to adapt the timestamp | ||
| // for Fuchsia to capture SceneUpdateContext::ExecutePaintTasks. | ||
| timing.Set(FrameTiming::kRasterFinish, fml::TimePoint::Now()); | ||
| last_render_finish_ = fml::TimePoint::Now(); | ||
| timing.Set(FrameTiming::kRasterFinish, last_render_finish_); | ||
| delegate_.OnFrameRasterized(timing); | ||
|
|
||
| // Pipeline pressure is applied from a couple of places: | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo? "begin_start -> build_start, begin_target -> build_target"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These times are measured based on the upcall to Window._beginFrame and so they are documenting the timeline of the "begin" part of the frame, thus their names. I just needed to add "target" to the existing mix and went with names to blend in, but I agree that the names are somewhat strained.
In our posted benchmark statistics we have the concept of "build" vs "rasterize" times and so your suggestion is in keeping with that nomenclature.
We also have the concept of "Produce" (meaning upcall to Dart to build a scene) vs "Consume" (meaning transfer a LayerTree to the GPU thread to be rendered) in the Engine "Pipeline" object.
I do like build_start, build_target, build_completion, build_duration as names better than what is there, but is there some other terminology in play in any of these systems we should consider before we make a change?