Skip to content
This repository was archived by the owner on Feb 25, 2025. It is now read-only.

Conversation

@ColdPaleLight
Copy link
Member

@ColdPaleLight ColdPaleLight commented Nov 19, 2022

Added SyncMode to specify how to synchronize when CommandBuffer:::SubmitCommands.

Note that the reason for adding this parameter instead of using CompleteCallback is following:

  1. The CompleteCallback of OpenGLES does not ensure that the CommandBuffer has been completed.
  2. Make it more convenient for developers.

Pre-launch Checklist

  • I read the [Contributor Guide] and followed the process outlined there for submitting PRs.
  • I read the [Tree Hygiene] wiki page, which explains my responsibilities.
  • I read and followed the [Flutter Style Guide] and the [C++, Objective-C, Java style guides].
  • I listed at least one issue that this PR fixes in the description above.
  • I added new tests to check the change I am making or feature I am adding, or Hixie said the PR is test-exempt. See [testing the engine] for instructions on writing and running engine tests.
  • I updated/added relevant documentation (doc comments with ///).
  • I signed the [CLA].
  • All existing and new tests are passing.

@ColdPaleLight ColdPaleLight changed the title [Impeller] Support ‘SyncMode’ in 'CommandBuffer:::SubmitCommands' [Impeller] Support 'SyncMode' in 'CommandBuffer:::SubmitCommands' Nov 19, 2022
@ColdPaleLight ColdPaleLight self-assigned this Nov 19, 2022
@ColdPaleLight ColdPaleLight changed the title [Impeller] Support 'SyncMode' in 'CommandBuffer:::SubmitCommands' [Impeller] Support 'SyncMode' in 'CommandBuffer::SubmitCommands' Nov 21, 2022
Copy link
Contributor

@dnfield dnfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems better to me than #37704 but I'd like @chinmaygarde and @bdero's thoughts on how this might get misused or any other concerns.

@ColdPaleLight ColdPaleLight requested a review from bdero November 22, 2022 00:42
@ColdPaleLight
Copy link
Member Author

friendly ping @chinmaygarde @bdero

gl.Flush();
break;
case CommandBuffer::SyncMode::kWaitUntilCompleted:
gl.Finish();
Copy link
Member

@bdero bdero Nov 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So looking at #37709, a couple of ideas come to mind:

  • For GLES: Since the blit pass uses glReadPixels, and GL commands are enqueued serially, is it actually necessary to call glFinish before downloading the texture? My understanding is that glReadPixels will synchronously block until any previous commands which attach the texture have completed on the GPU. In any case, doing the glFlush doesn't hurt and is probably a good idea.
  • This is a situation where it seems like the underlying rendering backend should own the choice about how to handle synchronization; we have an opportunity to avoid burdening users of the render layer API with an unnecessary choice and subtle bugs that are difficult to track down when used incorrectly:
    • For Metal (assuming waitUntilScheduled + hazard tracking isn't working as a sufficient stopgap for this), synchronization can be automatically managed by the renderer layer using native synchronization primitives:
      1. Have every TextureMTL track an optional MTLEvent, say TextureMTL::readable_event_.
      2. When a Metal RenderPassMTL encoding begins, create one new MTLEvent.
      3. For each attached texture in the render pass descriptor, set/overwrite the TextureMTL::readable_event_ to the new one.
      4. At the end of RenderPassMTL encoding, encode a signaling command for the event with encodeSignalEvent.
      5. Prior to encoding the Metal blit pass containing the device->host transfer, encode a wait command with encodeWaitForEvent against the source TextureMTL's event (if one has been set).
    • For Vulkan, we can create a timeline semaphore for every attachable VkImage being tracked which lasts the lifetime of the VkImage, incrementing the attached texture semaphores during RenderPassVK (but I'd recommend holding off on this until the Vulkan backend is further along, of course).

The above approach can be lightly tweaked later on if/when we support parallel command encoding (for example, by making the MTLEvents handle manipulation thread safe). Also, we can get rid of the kWaitUntilScheduled hack later on by extending this approach to track whether read events have finished, so that commands which write to a texture will wait until the reads are done regardless of the encoding order across command buffers.

Sorry about the runaround here! This aspect of our API is a bit unresolved as we haven't thrown together/documented a strategy for handling device resource synchronization comprehensively yet.

FYI @chinmaygarde, I'm thinking now might be the time to clean up my old notes about this and do a write up detailing how we can achieve maximal parallel encoding without exposing any explicit synchronization primitives above the render layer. It's feeling like the right time to address this since it's been popping up in the discussions around Vulkan too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdero It looks like you don't want to change the API, and expect to be able to solve the problem differently on different backends.

I guess we can further discuss the following two scenarios encountered on developing ui.Image.toByteData.

Scenario 1: Render the display list to texture on the Raster thread, and access the texture on the IO thread.

final ui.PictureRecorder recorder = ui.PictureRecorder();
final ui.Canvas canvas = ui.Canvas(recorder);
canvas.drawRect(const Rect.fromLTWH(0.0, 0.0, 100, 100), Paint()..color = Colors.red);
final ui.Picture picture = recorder.endRecording();
ui.Image image = await picture.toImage(100, 100); // render picture to texture on raster thread
ByteData? byteData = await image.toByteData(); // use texture on IO thread

OpenGLES

On the backend of OpenGLES, when the IO thread accesses the texture, we cannot know whether the command that the Raster thread renders the content to the texture is executed or not.

In this PR, I want to do a glFinish on the raster thread to solve this problem (I'm not sure if using glFlush here will also solve the problem). If we don't want developers to specify SyncMode explicitly, we need to introduce a synchronization mechanism, such as glFenceSync. However, these API are only available in GLES 3.0, not in GLES2.0.

Metal

In this scenario, it seems that Metal will not encounter this problem, because the commands corresponding to render to texture and access texture are all in the same CommandQueue, and the order in which they enter the queue in this scenario is determined (if I am wrong , please correct me).

Vulkan

TODO

Scenario 2: Blit operation to copy texture to buffer. We need to ensure that the cpu buffer has content after the blit operation API call ends.

OpenGLES

As you said, the implementation of OpenGLES uses glReadPixels, so there is no need to call glFinish.

Metal

We need to ensure that after the blit operation is completed, the cpu buffer already has data. This PR proposes to use SyncMode to ensure that if the API is not changed, we can also implement the corresponding functions through CompletionCallback, but it is not very convenient for developers to use.

Vulkan

TODO

Summarize

In scenario 1, if the OpenGLES backend does not have SyncMode, it seems that there is no way to ensure that the logic is correct, because OpenGLES 2.0 does not support glFenceSync.

In Scenario 2, Introducing SyncMode can make things sampler, otherwise CompletionCallback and fml::AutoResetWaitableEvent must be used which is a little complicated.

Do you have any thoughts on the above two questions? Thanks!

Copy link
Member

@bdero bdero Nov 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the backend of OpenGLES, when the IO thread accesses the texture, we cannot know whether the command that the Raster thread renders the content to the texture is executed or not.

The render pass and blit pass should both be encoded on the raster thread in serial. Would there be any issues with having picture.toImage and image.toByteData enqueue raster tasks with the same priority?

If we don't want developers to specify SyncMode explicitly, we need to introduce a synchronization mechanism, such as glFenceSync. However, these API are only available in GLES 3.0, not in GLES2.0.

Given the behavior of glReadPixels and the fact that GLES 2 commands should be encoded serially anyway (as the GL context is not guaranteed thread safe), I don't think there's a synchronization issue lurking between the framebuffer write and texture read that needs to be resolved.

The glReadPixels GLES 2 docs indicates that glReadPixels waits for completion of all previous commands that write to framebuffers:

glFinish does not return until the effects of all previously called GL commands are complete.
Such effects include all changes to GL state, all changes to connection state, and all
changes to the frame buffer contents.

Interestingly, the GLES 1.1 docs happen to be more explicit about it, but the description itself is really no different -- it would have been nice if this helpful note had survived to later specs. :)

glFinish is NOT required before a call to eglSwapBuffers or glReadPixels.
glFinish can take some time and for performance reasons it is best to use this function
infrequently and only when necessary.

Metal
We need to ensure that after the blit operation is completed, the cpu buffer already has data. This PR proposes to use SyncMode to ensure that if the API is not changed, we can also implement the corresponding functions through CompletionCallback, but it is not very convenient for developers to use.

While blocking the raster thread will be unavoidable for GL, using a completion callback for device->host transfers would allow us to avoid blocking the raster thread unnecessarily on Metal, Vulkan, and most other backends we're likely to implement over time (for Vulkan, we could invoke the callback from a worker thread after waiting on a fence, for example).

Copy link
Member Author

@ColdPaleLight ColdPaleLight Nov 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the behavior of glReadPixels and the fact that GLES 2 commands should be encoded serially anyway (as the GL context is not guaranteed thread safe), I don't think there's a synchronization issue lurking between the framebuffer write and texture read that needs to be resolved.

I think things are a little bit different here. There are two threads here, and each thread has its own EGLContext (the relationship between the two EGLContexts is a shared context). In this case, there is no way to guarantee that the render of the Raster thread must occur before the IO accesses the texture. And this is not just theoretical, in fact, I encountered this problem in the development of ui.Image.toByteData, it really happened.

That is to say, if the content is rendered on the raster thread, and then glReadPixels is called on the io thread, there is no guarantee that the expected content can be read.

Ahh, I see. You mean "if the render pass and blit pass both be encoded on the raster thread in serial, then there is no more synchronization issue".

Yes, that's right.

The render pass and blit pass should both be encoded on the raster thread in serial. Would there be any issues with having picture.toImage and image.toByteData enqueue raster tasks with the same priority?

That sounds good, I'll give it a try :)

Copy link
Member

@bdero bdero Nov 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two threads here, and each thread has its own EGLContext (the relationship between the two EGLContexts is a shared context).

Ah, I wasn't aware we were making a second context on the IO thread. Do you foresee any downside to just using one context and performing all of the GLES activity on the raster thread given the lack of appropriate synchronization primitives in GLES 2? We could always add a multi-context solution later on that gets used in the presence of GLES 3 procs, but I suspect the overhead savings would be somewhat negligible.

Ahh, I see. You mean "if the render pass and blit pass both be encoded on the raster thread in serial, then there is no more synchronization issue".

Indeed!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you foresee any downside to just using one context and performing all of the GLES activity on the raster thread given the lack of appropriate synchronization primitives in GLES 2?

Since the context of the raster thread is used for rendering, and the context of the io thread is used for texture uploading, I am worried that all gl activity on the raster thread will cause jank.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. Sorry, I meant more with respect to the device->host transfer. Performing new resource host->device transfers on the IO thread seems perfectly reasonable and safe.

@ColdPaleLight ColdPaleLight requested a review from bdero November 28, 2022 09:43
@ColdPaleLight
Copy link
Member Author

Closing it since #37709 has been landed, this API is no longer needed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

No open projects
Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants