Skip to content
This repository was archived by the owner on Feb 25, 2025. It is now read-only.
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion flow/layers/layer_tree.cc
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,9 @@ LayerTree::LayerTree(const SkISize& frame_size,
checkerboard_raster_cache_images_(false),
checkerboard_offscreen_layers_(false) {}

void LayerTree::RecordBuildTime(fml::TimePoint start) {
void LayerTree::RecordBuildTime(fml::TimePoint start, fml::TimePoint target) {
build_start_ = start;
build_target_ = target;
build_finish_ = fml::TimePoint::Now();
}

Expand Down
4 changes: 3 additions & 1 deletion flow/layers/layer_tree.h
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,9 @@ class LayerTree {
float frame_physical_depth() const { return frame_physical_depth_; }
float frame_device_pixel_ratio() const { return frame_device_pixel_ratio_; }

void RecordBuildTime(fml::TimePoint begin_start);
void RecordBuildTime(fml::TimePoint begin_start, fml::TimePoint begin_target);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo? "begin_start -> build_start, begin_target -> build_target"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These times are measured based on the upcall to Window._beginFrame and so they are documenting the timeline of the "begin" part of the frame, thus their names. I just needed to add "target" to the existing mix and went with names to blend in, but I agree that the names are somewhat strained.

In our posted benchmark statistics we have the concept of "build" vs "rasterize" times and so your suggestion is in keeping with that nomenclature.

We also have the concept of "Produce" (meaning upcall to Dart to build a scene) vs "Consume" (meaning transfer a LayerTree to the GPU thread to be rendered) in the Engine "Pipeline" object.

I do like build_start, build_target, build_completion, build_duration as names better than what is there, but is there some other terminology in play in any of these systems we should consider before we make a change?

fml::TimePoint build_start() const { return build_start_; }
fml::TimePoint build_target() const { return build_target_; }
fml::TimePoint build_finish() const { return build_finish_; }
fml::TimeDelta build_time() const { return build_finish_ - build_start_; }

Expand Down Expand Up @@ -82,6 +83,7 @@ class LayerTree {
private:
std::shared_ptr<Layer> root_layer_;
fml::TimePoint build_start_;
fml::TimePoint build_target_;
fml::TimePoint build_finish_;
SkISize frame_size_ = SkISize::MakeEmpty(); // Physical pixels.
float frame_physical_depth_;
Expand Down
1 change: 0 additions & 1 deletion fml/message_loop_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,6 @@ MessageLoopImpl::~MessageLoopImpl() {
void MessageLoopImpl::PostTask(const fml::closure& task,
fml::TimePoint target_time) {
FML_DCHECK(task != nullptr);
FML_DCHECK(task != nullptr);
if (terminated_) {
// If the message loop has already been terminated, PostTask should destruct
// |task| synchronously within this function.
Expand Down
4 changes: 3 additions & 1 deletion shell/common/animator.cc
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Animator::Animator(Delegate& delegate,
task_runners_(std::move(task_runners)),
waiter_(std::move(waiter)),
last_begin_frame_time_(),
last_begin_target_time_(),
dart_frame_deadline_(0),
#if FLUTTER_SHELL_ENABLE_METAL
layer_tree_pipeline_(fml::MakeRefCounted<LayerTreePipeline>(2)),
Expand Down Expand Up @@ -133,6 +134,7 @@ void Animator::BeginFrame(fml::TimePoint frame_start_time,
FML_DCHECK(producer_continuation_);

last_begin_frame_time_ = frame_start_time;
last_begin_target_time_ = frame_target_time;
dart_frame_deadline_ = FxlToDartOrEarlier(frame_target_time);
{
TRACE_EVENT2("flutter", "Framework Workload", "mode", "basic", "frame",
Expand Down Expand Up @@ -178,7 +180,7 @@ void Animator::Render(std::unique_ptr<flutter::LayerTree> layer_tree) {

if (layer_tree) {
// Note the frame time for instrumentation.
layer_tree->RecordBuildTime(last_begin_frame_time_);
layer_tree->RecordBuildTime(last_begin_frame_time_, last_begin_target_time_);
}

// Commit the pending continuation.
Expand Down
1 change: 1 addition & 0 deletions shell/common/animator.h
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@ class Animator final {
std::shared_ptr<VsyncWaiter> waiter_;

fml::TimePoint last_begin_frame_time_;
fml::TimePoint last_begin_target_time_;
int64_t dart_frame_deadline_;
fml::RefPtr<LayerTreePipeline> layer_tree_pipeline_;
fml::Semaphore pending_frame_semaphore_;
Expand Down
17 changes: 14 additions & 3 deletions shell/common/pipeline.h
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,9 @@ class Pipeline : public fml::RefCountedThreadSafe<Pipeline<R>> {

{
std::scoped_lock lock(queue_mutex_);
if (queue_.size() == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why queue_.size() == 0 doesn't have to be handled previously? It should cause a crash in queue_.pop_front() when queue is empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always had a 1:1 correlation between putting a LayerTree on the queue and taking it off, but I wanted to eliminate the queue and so now the task can be scheduled when there is something in the queue, but by the time it actually runs we might have eliminated something in the queue.

When this fix goes beyond the WIP stage, I'll see if I can throttle the task posting so we only have at most one outstanding task and only when we have a waiting LayerTree to process, but it's a lot easier to just let the posted tasks go away on their own.

Minimally, this might need to change to if (<nextTreePointer> == null) when I delete the queue, whether or not I make the task posting more conditional.

return PipelineConsumeResult::NoneAvailable;
}
std::tie(resource, trace_id) = std::move(queue_.front());
queue_.pop_front();
items_count = queue_.size();
Expand Down Expand Up @@ -175,6 +178,10 @@ class Pipeline : public fml::RefCountedThreadSafe<Pipeline<R>> {
void ProducerCommit(ResourcePtr resource, size_t trace_id) {
{
std::scoped_lock lock(queue_mutex_);
if (queue_.size() > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this limiting the queue to only have size no more than 1? If it always has size no more than 1, why do we need a queue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I don't see the need to have more than one LayerTree waiting for us to process and stacking them up in a queue just provides more ways to fall behind in the GPU thread. This is a minimal impact way of pretending we don't have a queue. A larger change to switch to just a pointer will happen if we believe that this approach is correct.

Copy link
Contributor Author

@flar flar Jan 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also need to find out how this impacts other teams that might be using our code to do more than just the buffering we do on iOS. In particular, I heard of a team that was thinking that their scrolling might get smoother if they switched to quadruple buffering. They may actually be happier with this form of throttling the queue, or not. If they really do want to quadruple buffer, then we might need to keep a queue and the throttling to a size of 1 might be iOS specific.

I also haven't tested the impact of eliminating the queue on Android which seems less bothered by presenting more than one frame per vsync interval. Does this help or hurt on Android (rhetorical question for myself to investigate)?

FML_LOG(INFO) << "queue was already stocked";
queue_.clear();
}
queue_.emplace_back(std::move(resource), trace_id);
}

Expand All @@ -185,9 +192,13 @@ class Pipeline : public fml::RefCountedThreadSafe<Pipeline<R>> {
void ProducerCommitFront(ResourcePtr resource, size_t trace_id) {
{
std::scoped_lock lock(queue_mutex_);
queue_.emplace_front(std::move(resource), trace_id);
while (queue_.size() > depth_) {
queue_.pop_back();
if (queue_.size() > 0) {
FML_LOG(INFO) << "ignoring CommitFront in favor of existing queue entry";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any consequence or downside of throwing away the resource when the queue is nonempty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in the ProduceToFront (of queue) call. It is called when we go to rasterize a LayerTree and are told to resubmit it instead. When we had a queue, that LayerTree came off the front of the queue and so that's where it should go back on. In the new system I'm assuming that something in the queue will be newer than what we had just been processing and so I just let it fall on the floor.

These two methods to add something to the queue could all be replaced by a single version of "ProduceCommit" that just compares the target times of the incoming LayerTree with the target time of the one that is already waiting and keeps only the newest version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add that the only source of "kResubmit" in the code I could find came from the iOS platform view code and looked to be related to a change in threading resulting from a change in the structure of a frame (due to a change in the embedding?)...

shell/platform/darwin/ios/framework/Source/FlutterPlatformViews.mm: return PostPrerollResult::kResubmitFrame;

} else {
queue_.emplace_front(std::move(resource), trace_id);
while (queue_.size() > depth_) {
queue_.pop_back();
}
}
}

Expand Down
29 changes: 26 additions & 3 deletions shell/common/rasterizer.cc
Original file line number Diff line number Diff line change
Expand Up @@ -138,12 +138,12 @@ void Rasterizer::Draw(fml::RefPtr<Pipeline<flutter::LayerTree>> pipeline) {
// between successive tries.
switch (consume_result) {
case PipelineConsumeResult::MoreAvailable: {
task_runners_.GetGPUTaskRunner()->PostTask(
task_runners_.GetGPUTaskRunner()->PostTaskForTime(
[weak_this = weak_factory_.GetWeakPtr(), pipeline]() {
if (weak_this) {
weak_this->Draw(pipeline);
}
});
}, resubmitted_time_target_);
break;
}
default:
Expand Down Expand Up @@ -231,6 +231,19 @@ sk_sp<SkImage> Rasterizer::ConvertToRasterImage(sk_sp<SkImage> image) {
});
}

fml::TimePoint Rasterizer::DetermineRasterizationTime(LayerTree& layer_tree) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the time to start rasterization, and thus usually the time to finish build?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implements the "only one per vsync interval" part of the solution. It is computing "when do we want to start this rasterization" based on whether we have already presented a LayerTree in this interval or not. If it is in the future, we will want to postpone our current task until this calculated time.

fml::TimePoint start = layer_tree.build_start();
if (start < last_render_finish_) {
fml::TimePoint target = layer_tree.build_target();
fml::TimePoint now = fml::TimePoint::Now();
if (target > now) {
return target;
}
return now + (target - start);
}
return start;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand why this function needs to work in this way. Returning build start time as rasterization time looks strange.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should add comments, but these computations may change over time under this WIP.

If the GPU thread was directly informed of the latest reported vsync times from the vsync_waiter, then it could make better guesses as to when to postpone the current task. It doesn't get that information because sometimes the vsync is paused (when we get too far behind), and the info is delivered to another thread through a separate callback mechanism, so getting that information to the rasterizer involves a communication channel that I haven't identified (or created?) yet. For now, it will base its decision on the posted LayerTree that it is planning to handle.

This particular computation can be beefed up more in the future to make better estimates. It doesn't fully prevent the condition in some cases, but it keeps us from entering the condition if we are only one frame behind, possibly more and it also keeps us from perpetuation the condition for more than a single frame when we are multiple frames behind.

Here is the current heuristic...

If we start processing a LayerTree and a new one gets posted while we are processing it, then the time we finished rendering the current one will be after the "begin" time of the new one. That seems to work pretty well to detect the condition.

If our last rasterization end time is before the target time of this LayerTree then that is an ideal case where we are only 1 frame behind and we can schedule for the target time of that frame and that is the best estimate we have of when the next vsync interval starts. Note that because we will start processing the LayerTree exactly on the leading edge of the next vsync interval then we don't give the produce end of the pipeline any time to compute a more recent frame. Further on in this comment I talk about future changes to the production end that can help alleviate this lag, but for now this lag is no worse than when we get blocked in the kernel anyway.

If we are later than even the target time of the LayerTree we are processing, then we need to pick some point in the future that is after the current vsync interval for which we have no direct start/target times. I could linearly extrapolate the vsync intervals from the historic information in the LayerTree I have at hand, but for now I chose "now+frame_duration" (as computed from the target-start times in the LayerTree. This could actually be better than computing the next real vsync time because it means when that vsync interval comes to happen, we might have a chance to quickly compute a new scene and post it before the GPU thread task wakes up - which will mean we render a more "recent" scene. If we start processing immediately at the next vsync time then we will be rendering in parallel with a scene build and we'll end up presenting an out of date scene just as a fresh scene becomes available - at which point we can't render that scene until the next interval.

Eventually I want to instrument the code that fields a vsync notification and if it detects that we are already rasterizing a LayerTree, then have it either punt back and wait for the next vysnc notification to synch up, or to just go ahead and compute the next frame. I originally tried that strategy to see if it would help, but it didn't solve the problem. It might serve a role in the current "fix", though. To be clear, I am thinking of the following in the vsync callback:

  if (<gpu runner is rasterizing>) {
    duration = target - start;
    start = target;
    target += duration;
  }

That code would compute "frame+1" and start that processing now rather than wait for the next vsync notification. This gives us headroom in case the build times are taking longer than a single frame to compute (which likely correlates with the rasterization times going long as well).

Eventually we'd want some really sophisticated prediction algorithms that try to figure out "If I start computing a scene now, how long do I think it will take to do that, and how long do I think we'd take to render it, and what time do I actually think it will show up on the screen", but even the baby step I just outlined would be 1 better than our current system of always rendering based on the reported times. Also, such a heuristic calculation might get caught off guard if its predictions suddenly change - after all, who knows how much complexity got added or removed at the app level? Nothing in this code has that information. But, at least it can decide not to compute a scene that it can show won't get rendered until after the reported target time.

Also, a more sophisticated prediction algorithm would come in 2 parts - the build part that tries to target the sum of "build + raster" timing, and then when the task is posted to the GPU runner it will have to compute when to start processing the frame so that it posts approximately when it was computed for. There is a lot of magic to consider, and be worried about, there...

}

RasterStatus Rasterizer::DoDraw(
std::unique_ptr<flutter::LayerTree> layer_tree) {
FML_DCHECK(task_runners_.GetGPUTaskRunner()->RunsTasksOnCurrentThread());
Expand All @@ -239,6 +252,14 @@ RasterStatus Rasterizer::DoDraw(
return RasterStatus::kFailed;
}

fml::TimePoint when = DetermineRasterizationTime(*layer_tree);
if (when > layer_tree->build_start()) {
FML_LOG(INFO) << "Rescheduling";
resubmitted_layer_tree_ = std::move(layer_tree);
resubmitted_time_target_ = when;
return RasterStatus::kResubmit;
}

FrameTiming timing;
timing.Set(FrameTiming::kBuildStart, layer_tree->build_start());
timing.Set(FrameTiming::kBuildFinish, layer_tree->build_finish());
Expand All @@ -252,6 +273,7 @@ RasterStatus Rasterizer::DoDraw(
last_layer_tree_ = std::move(layer_tree);
} else if (raster_status == RasterStatus::kResubmit) {
resubmitted_layer_tree_ = std::move(layer_tree);
resubmitted_time_target_ = fml::TimePoint::Now();
return raster_status;
}

Expand All @@ -265,7 +287,8 @@ RasterStatus Rasterizer::DoDraw(
// TODO(liyuqian): in Fuchsia, the rasterization doesn't finish when
// Rasterizer::DoDraw finishes. Future work is needed to adapt the timestamp
// for Fuchsia to capture SceneUpdateContext::ExecutePaintTasks.
timing.Set(FrameTiming::kRasterFinish, fml::TimePoint::Now());
last_render_finish_ = fml::TimePoint::Now();
timing.Set(FrameTiming::kRasterFinish, last_render_finish_);
delegate_.OnFrameRasterized(timing);

// Pipeline pressure is applied from a couple of places:
Expand Down
4 changes: 4 additions & 0 deletions shell/common/rasterizer.h
Original file line number Diff line number Diff line change
Expand Up @@ -412,10 +412,12 @@ class Rasterizer final : public SnapshotDelegate {
std::unique_ptr<flutter::CompositorContext> compositor_context_;
// This is the last successfully rasterized layer tree.
std::unique_ptr<flutter::LayerTree> last_layer_tree_;
fml::TimePoint last_render_finish_;
// Set when we need attempt to rasterize the layer tree again. This layer_tree
// has not successfully rasterized. This can happen due to the change in the
// thread configuration. This will be inserted to the front of the pipeline.
std::unique_ptr<flutter::LayerTree> resubmitted_layer_tree_;
fml::TimePoint resubmitted_time_target_;
fml::closure next_frame_callback_;
bool user_override_resource_cache_bytes_;
std::optional<size_t> max_cache_bytes_;
Expand All @@ -433,6 +435,8 @@ class Rasterizer final : public SnapshotDelegate {
SkISize size,
std::function<void(SkCanvas*)> draw_callback);

fml::TimePoint DetermineRasterizationTime(LayerTree& layer_tree);

RasterStatus DoDraw(std::unique_ptr<flutter::LayerTree> layer_tree);

RasterStatus DrawToSurface(flutter::LayerTree& layer_tree);
Expand Down