-
Notifications
You must be signed in to change notification settings - Fork 6k
[Impeller] Support 'SyncMode' in 'CommandBuffer::SubmitCommands' #37772
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems better to me than #37704 but I'd like @chinmaygarde and @bdero's thoughts on how this might get misused or any other concerns.
|
friendly ping @chinmaygarde @bdero |
| gl.Flush(); | ||
| break; | ||
| case CommandBuffer::SyncMode::kWaitUntilCompleted: | ||
| gl.Finish(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So looking at #37709, a couple of ideas come to mind:
- For GLES: Since the blit pass uses
glReadPixels, and GL commands are enqueued serially, is it actually necessary to callglFinishbefore downloading the texture? My understanding is thatglReadPixelswill synchronously block until any previous commands which attach the texture have completed on the GPU. In any case, doing theglFlushdoesn't hurt and is probably a good idea. - This is a situation where it seems like the underlying rendering backend should own the choice about how to handle synchronization; we have an opportunity to avoid burdening users of the render layer API with an unnecessary choice and subtle bugs that are difficult to track down when used incorrectly:
- For Metal (assuming
waitUntilScheduled+ hazard tracking isn't working as a sufficient stopgap for this), synchronization can be automatically managed by the renderer layer using native synchronization primitives:- Have every
TextureMTLtrack an optionalMTLEvent, sayTextureMTL::readable_event_. - When a Metal
RenderPassMTLencoding begins, create one newMTLEvent. - For each attached texture in the render pass descriptor, set/overwrite the
TextureMTL::readable_event_to the new one. - At the end of
RenderPassMTLencoding, encode a signaling command for the event withencodeSignalEvent. - Prior to encoding the Metal blit pass containing the device->host transfer, encode a wait command with
encodeWaitForEventagainst the sourceTextureMTL's event (if one has been set).
- Have every
- For Vulkan, we can create a timeline semaphore for every attachable
VkImagebeing tracked which lasts the lifetime of theVkImage, incrementing the attached texture semaphores duringRenderPassVK(but I'd recommend holding off on this until the Vulkan backend is further along, of course).
- For Metal (assuming
The above approach can be lightly tweaked later on if/when we support parallel command encoding (for example, by making the MTLEvents handle manipulation thread safe). Also, we can get rid of the kWaitUntilScheduled hack later on by extending this approach to track whether read events have finished, so that commands which write to a texture will wait until the reads are done regardless of the encoding order across command buffers.
Sorry about the runaround here! This aspect of our API is a bit unresolved as we haven't thrown together/documented a strategy for handling device resource synchronization comprehensively yet.
FYI @chinmaygarde, I'm thinking now might be the time to clean up my old notes about this and do a write up detailing how we can achieve maximal parallel encoding without exposing any explicit synchronization primitives above the render layer. It's feeling like the right time to address this since it's been popping up in the discussions around Vulkan too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bdero It looks like you don't want to change the API, and expect to be able to solve the problem differently on different backends.
I guess we can further discuss the following two scenarios encountered on developing ui.Image.toByteData.
Scenario 1: Render the display list to texture on the Raster thread, and access the texture on the IO thread.
final ui.PictureRecorder recorder = ui.PictureRecorder();
final ui.Canvas canvas = ui.Canvas(recorder);
canvas.drawRect(const Rect.fromLTWH(0.0, 0.0, 100, 100), Paint()..color = Colors.red);
final ui.Picture picture = recorder.endRecording();
ui.Image image = await picture.toImage(100, 100); // render picture to texture on raster thread
ByteData? byteData = await image.toByteData(); // use texture on IO thread
OpenGLES
On the backend of OpenGLES, when the IO thread accesses the texture, we cannot know whether the command that the Raster thread renders the content to the texture is executed or not.
In this PR, I want to do a glFinish on the raster thread to solve this problem (I'm not sure if using glFlush here will also solve the problem). If we don't want developers to specify SyncMode explicitly, we need to introduce a synchronization mechanism, such as glFenceSync. However, these API are only available in GLES 3.0, not in GLES2.0.
Metal
In this scenario, it seems that Metal will not encounter this problem, because the commands corresponding to render to texture and access texture are all in the same CommandQueue, and the order in which they enter the queue in this scenario is determined (if I am wrong , please correct me).
Vulkan
TODO
Scenario 2: Blit operation to copy texture to buffer. We need to ensure that the cpu buffer has content after the blit operation API call ends.
OpenGLES
As you said, the implementation of OpenGLES uses glReadPixels, so there is no need to call glFinish.
Metal
We need to ensure that after the blit operation is completed, the cpu buffer already has data. This PR proposes to use SyncMode to ensure that if the API is not changed, we can also implement the corresponding functions through CompletionCallback, but it is not very convenient for developers to use.
Vulkan
TODO
Summarize
In scenario 1, if the OpenGLES backend does not have SyncMode, it seems that there is no way to ensure that the logic is correct, because OpenGLES 2.0 does not support glFenceSync.
In Scenario 2, Introducing SyncMode can make things sampler, otherwise CompletionCallback and fml::AutoResetWaitableEvent must be used which is a little complicated.
Do you have any thoughts on the above two questions? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the backend of OpenGLES, when the IO thread accesses the texture, we cannot know whether the command that the Raster thread renders the content to the texture is executed or not.
The render pass and blit pass should both be encoded on the raster thread in serial. Would there be any issues with having picture.toImage and image.toByteData enqueue raster tasks with the same priority?
If we don't want developers to specify SyncMode explicitly, we need to introduce a synchronization mechanism, such as glFenceSync. However, these API are only available in GLES 3.0, not in GLES2.0.
Given the behavior of glReadPixels and the fact that GLES 2 commands should be encoded serially anyway (as the GL context is not guaranteed thread safe), I don't think there's a synchronization issue lurking between the framebuffer write and texture read that needs to be resolved.
The glReadPixels GLES 2 docs indicates that glReadPixels waits for completion of all previous commands that write to framebuffers:
glFinish does not return until the effects of all previously called GL commands are complete.
Such effects include all changes to GL state, all changes to connection state, and all
changes to the frame buffer contents.
Interestingly, the GLES 1.1 docs happen to be more explicit about it, but the description itself is really no different -- it would have been nice if this helpful note had survived to later specs. :)
glFinish is NOT required before a call to eglSwapBuffers or glReadPixels.
glFinish can take some time and for performance reasons it is best to use this function
infrequently and only when necessary.
Metal
We need to ensure that after the blit operation is completed, the cpu buffer already has data. This PR proposes to use SyncMode to ensure that if the API is not changed, we can also implement the corresponding functions through CompletionCallback, but it is not very convenient for developers to use.
While blocking the raster thread will be unavoidable for GL, using a completion callback for device->host transfers would allow us to avoid blocking the raster thread unnecessarily on Metal, Vulkan, and most other backends we're likely to implement over time (for Vulkan, we could invoke the callback from a worker thread after waiting on a fence, for example).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the behavior of glReadPixels and the fact that GLES 2 commands should be encoded serially anyway (as the GL context is not guaranteed thread safe), I don't think there's a synchronization issue lurking between the framebuffer write and texture read that needs to be resolved.
I think things are a little bit different here. There are two threads here, and each thread has its own EGLContext (the relationship between the two EGLContexts is a shared context). In this case, there is no way to guarantee that the render of the Raster thread must occur before the IO accesses the texture. And this is not just theoretical, in fact, I encountered this problem in the development of ui.Image.toByteData, it really happened.
That is to say, if the content is rendered on the raster thread, and then glReadPixels is called on the io thread, there is no guarantee that the expected content can be read.
Ahh, I see. You mean "if the render pass and blit pass both be encoded on the raster thread in serial, then there is no more synchronization issue".
Yes, that's right.
The render pass and blit pass should both be encoded on the raster thread in serial. Would there be any issues with having picture.toImage and image.toByteData enqueue raster tasks with the same priority?
That sounds good, I'll give it a try :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two threads here, and each thread has its own EGLContext (the relationship between the two EGLContexts is a shared context).
Ah, I wasn't aware we were making a second context on the IO thread. Do you foresee any downside to just using one context and performing all of the GLES activity on the raster thread given the lack of appropriate synchronization primitives in GLES 2? We could always add a multi-context solution later on that gets used in the presence of GLES 3 procs, but I suspect the overhead savings would be somewhat negligible.
Ahh, I see. You mean "if the render pass and blit pass both be encoded on the raster thread in serial, then there is no more synchronization issue".
Indeed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you foresee any downside to just using one context and performing all of the GLES activity on the raster thread given the lack of appropriate synchronization primitives in GLES 2?
Since the context of the raster thread is used for rendering, and the context of the io thread is used for texture uploading, I am worried that all gl activity on the raster thread will cause jank.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right. Sorry, I meant more with respect to the device->host transfer. Performing new resource host->device transfers on the IO thread seems perfectly reasonable and safe.
|
Closing it since #37709 has been landed, this API is no longer needed. |
Added
SyncModeto specify how to synchronize whenCommandBuffer:::SubmitCommands.Note that the reason for adding this parameter instead of using
CompleteCallbackis following:CompleteCallbackof OpenGLES does not ensure that theCommandBufferhas been completed.Pre-launch Checklist
///).