-
Notifications
You must be signed in to change notification settings - Fork 6k
[Impeller] Wire up hardware buffer backed swapchains on Android. #52087
Conversation
292a1ca to
93beffb
Compare
|
@johnmccutchan @jonahwilliams I am dealing with some minor graphical corruption (only) during image reuse that I'm trying to pin down still. AFAICT, the synchronization is all there but this stuff is extremely poorly documented so its a bit of a crapshoot. Would love to pair on this in a bit if I am not able to make progress on pinning this down. If you run this locally, you should be able to observe some checkerboarding artifacts on some transitions. Will update this this patchset with fixes once I have them, but the rest of the stuff is good for initial review. |
|
Happy to take a look this week, I'm really excitied about this change! |
|
Golden file changes have been found for this pull request. Click here to view and triage (e.g. because this is an intentional change). If you are still iterating on this change and are not ready to resolve the images on the Flutter Gold dashboard, consider marking this PR as a draft pull request above. You will still be able to view image results on the dashboard, commenting will be silenced, and the check will not try to resolve itself until marked ready for review. |
|
Some observations. I think there is insufficient synchronization. It is unclear what else other than the external_fence_fd is required at this time. If I attempt to capture the corruption using the built in video capture features on the device, I see no corruption. I've attached a capture of what it actually looks like to the human eye. Notably, when I run in profile or release modes, I see more corruption which strengthens the theory that I am missing some synchronization. Human Eye IMG_2127.MOVDirect Screen Capture ScreenCapture.mp4 |
jonahwilliams
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still building this. Just a guess on the synchronization. The fence fd ensures that the compositor doesn't read the surface until we're done writing to it. assuming that is done correctly, the error could be on the acquisition. That is, perhaps we start working on the surface before the compositor is done presenting it.
With the normal vulkan swapchain images I think that should be handled by the layout transitions we add at the start of a given render pass. But its possible that the AHB backed textures synchronization doesn't work the same, or has a bug in its implementation.
|
I think I figured it out. The fence signal operation doesn't have any command buffers specified along with it. In the old swapchains, the command buffer has a pipeline barrier waiting for all writes to the texture being completed in the color attachment output stage. There is nothing in this implementation. Adding it now. I need to figure out what the new layout is supposed to be since it can't be present src KHR and we have checks internally for redundant layout modifications. Momentido. |
|
That wasn't it. Though I admit I now don't know if the fence signal operation is actually waiting for the useful stuff.
I only recycle the texture after the completion callback has been invoked. Perhaps thats not enough? Curiously, if I set max_entries in the texture pool to zero, I don't see the artifacts. So I think you are on the right track. Also, the artifacts is checkerboarding. Not the usual display of old contents or corrupted textures. So perhaps I am messing up the load ops? |
|
Ooh, another observation; the last frame of an animation is never corrupt. So the hunch that I am recycling too soon makes the most sense right now. |
|
Unrelated, but if you navigate to a screen with a backdrop filter you'll get a validation error about missing transfer dst: |
|
I think if the presentation transitions the image to ePresentSrcKHR, then in theory the barrier we insert in the render_pass from ePresentSrcKHR to eGeneral should handle synchronization. |
This is the docstring for OnComplete. I am definitely recycling to soon. My understanding was that the image was copied out and could be recycled.
ePresentSrcKHR is part of VK_KHR_Swapchain. I am going to use the general tag for the acquire fence. On the way back, we can switch from general back to either attachment output or sampled. |
|
Whether or not its part of the swapchain implementation, without a barrier in the render pass we're potentially consuming the swapchain image too soon. maybe we should be transitioning from undefined or something? Not sure, but I suspect because the image attachment never leaves eGeneral we have no sync on the consuming side. |
|
Ahh yeah, if that gives you a done signal we can use that instead of the barrier |
|
The flickering is fixed. Taking a look at the barrier (on the swapchain side) and validation error you observed next. |
|
Golden file changes are available for triage from new commit, Click here to view. |
|
The validation errors are fixed and the input attachment case is handled. The barrier next. |
|
if I continually pump frames on a Pixel 7 pro, I can get up to 3 entries every so often which will immediately pop the last one. Which we then immediately recreate. We could probably set max images to 3 for now, but maybe we need to be more sophisticated when picking that number |
|
Thats entirely possible. I also set up it up in a way where the push/pop operations happen at the back of deque so that the buffers in the front have the best chance to age out. I don't have a good intuition for picking the buffer counts. But we could bump up the pool size and disable buffer aging. Though some additional allocations are definitely going to be seen in these swapchains. |
|
From that graph, we can also move the surface transaction apply call off to a background thread no problem. |
|
I wouldn't do that immediately at least. There is some odd interaction between the swapchain and hybrid composition rendering on Android that I had to work around in the past: #44881 |
|
Yeah, just going to get these working. Can tune and rethread after. Backing out of the wonders app using the back button leads to device loss. Figuring out why. |
Not via Vulkan device properties since those reflect the properties of the device after passthrough. The internet seems to have some guidance but its imperfect. I was hoping that Vulkan would report that device was emulated via |
|
See https://developer.android.com/ndk/guides/graphics/design-notes#hardware-acceleration
Maybe VK_PHYSICAL_DEVICE_TYPE_CPU + swiftshader + goog manufacturer id = emulator? |
| "control."; | ||
| return false; | ||
| } | ||
| return transaction.Apply([signaler, texture, weak = weak_from_this()]() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what thread this runs on, but for hybrid composition today there are some assumptions about flutters presentation, we may need to conditionally block on the presenting thread until we reach some point in presentation, either here or in the worse case later on when it says we're done with the texture.
See #44881 for how I did this originally. Though this code was backed out later so the android code would need to be redone
|
If you want to, land this with the emulator workaround (so the tests don't break) and we can handle the hybrid composition bug as a follow up. Or feel free to land it off by default with an ifdef if you'd like to handle that separately. |
c222d0a to
786729a
Compare
| return desc; | ||
| } | ||
|
|
||
| void* HardwareBuffer::Lock(CPUAccessType type) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we using Lock/unlock or was this only for testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just for testing. But I'm inclined to leave it in there since it is handy when you need it and get LTOd away when you don't. But your call on if this should be backed out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its fine to leave them
jonahwilliams
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we can't run on the engine emulators it would be nice if you could add some sanity checking to the android unit tests (setup /teardown swapchain at least), so that if we completely break it we find out before it rolls into the framework.
Which target is this? |
|
The ones you added a month or so back? |
This wires up Android Hardware Buffer backed swapchains on platform that support it (Android >= 29) with a KHR swapchain fallback (which can be removed later to save on some binary size if needed). Some salient features of these swapchains and their differences with the KHR variant: * All swapchain images are guaranteed to R8G8B8A8. This could potentially allow for earlier creation of the context and the PSO libraries. * All swapchain allocations are lazy. This should greatly reduce the time it takes to create and resize a swapchain. However, swapchain image acquisition may take longer for the first few frame where there is no pre-pooled image available. Resizes should be similarly faster since the swapchain images for the intermediate sizes are never created. * Swapchain transients allocations (the attachments for the root render target) are also lazily allocated. * Swapchain images are pool and reused. The size of the pool is user specified (currently 2). If an image in the pool ages past a user supplied value (currently 1 second), it is collected as well. Applications that don't render frames for a long period of time should see less memory use because of swapchain image allocations. * The present mode of AHB swapchains behave similar to KHR swapchains but with VK_PRESENT_MODE_MAILBOX_KHR. In cases where there is no application managed frame pipelining, this might cause images to never be presented if a newer image is available. This wasted work can only be avoided by application provided pipelining. * There are no client side waits during image presentation. Instead, a new type of fence is wired up that exports its state as a sync file descriptor. The fence signal operation is enqueued on the client side and the buffer is set on the surface control. The presentation engine then performs the wait. * On Qualcomm devices, Chromium seems to be setting vendor specified flags for opting the hardware buffers into using UBWC. AFAICT, this is similar to AFBC (and NOT AFRC) on ARM Mali. This has not been wired up since I don't have a Qualcomm device at the moment and cant verify bandwidth use using GPU counters. I would also like to verify that UBWC is safe to use to images that can be used as input attachments.
b778b0f to
49643c4
Compare
|
I haven't wired it up yet to run on actual devices. |
jonahwilliams
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Devicelab will be our integration test then.
LGTM
|
Having some trouble with the CI tests failing on unrelated things. |
|
Gogogogo |
…147670) flutter/engine@5129b49...842cf25 2024-05-01 [email protected] [Impeller] Wire up hardware buffer backed swapchains on Android. (flutter/engine#52087) 2024-05-01 [email protected] Roll Skia from bfbd738fb97b to 0b31f64bd8d8 (3 revisions) (flutter/engine#52492) 2024-05-01 [email protected] Roll Dart SDK from d946408c559a to f5a429107c87 (2 revisions) (flutter/engine#52487) If this roll has caused a breakage, revert this CL and stop the roller using the controls here: https://autoroll.skia.org/r/flutter-engine-flutter-autoroll Please CC [email protected],[email protected],[email protected] on the revert to ensure that a human is aware of the problem. To file a bug in Flutter: https://github.com/flutter/flutter/issues/new/choose To report a problem with the AutoRoller itself, please file a bug: https://issues.skia.org/issues/new?component=1389291&template=1850622 Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+doc/main/autoroll/README.md
|
Heres the drop, 10% reduction in |

This wires up Android Hardware Buffer backed swapchains on platform that support
it (Android >= 29) with a KHR swapchain fallback (which can be removed later to
save on some binary size if needed).
Some salient features of these swapchains and their differences with the KHR
variant:
for earlier creation of the context and the PSO libraries.
takes to create and resize a swapchain. However, swapchain image acquisition
may take longer for the first few frame where there is no pre-pooled image
available. Resizes should be similarly faster since the swapchain images for
the intermediate sizes are never created.
are also lazily allocated.
(currently 2). If an image in the pool ages past a user supplied value
(currently 1 second), it is collected as well. Applications that don't render
frames for a long period of time should see less memory use because of
swapchain image allocations.
VK_PRESENT_MODE_MAILBOX_KHR. In cases where there is no application managed
frame pipelining, this might cause images to never be presented if a newer
image is available. This wasted work can only be avoided by application
provided pipelining.
of fence is wired up that exports its state as a sync file descriptor. The
fence signal operation is enqueued on the client side and the buffer is set on
the surface control. The presentation engine then performs the wait.
opting the hardware buffers into using UBWC. AFAICT, this is similar to AFBC
(and NOT AFRC) on ARM Mali. This has not been wired up since I don't have a
Qualcomm device at the moment and cant verify bandwidth use using GPU
counters. I would also like to verify that UBWC is safe to use to images that
can be used as input attachments.