Skip to content
This repository was archived by the owner on Feb 25, 2025. It is now read-only.

Conversation

@chinmaygarde
Copy link
Member

@chinmaygarde chinmaygarde commented Apr 12, 2024

This wires up Android Hardware Buffer backed swapchains on platform that support
it (Android >= 29) with a KHR swapchain fallback (which can be removed later to
save on some binary size if needed).

Some salient features of these swapchains and their differences with the KHR
variant:

  • All swapchain images are guaranteed to R8G8B8A8. This could potentially allow
    for earlier creation of the context and the PSO libraries.
  • All swapchain allocations are lazy. This should greatly reduce the time it
    takes to create and resize a swapchain. However, swapchain image acquisition
    may take longer for the first few frame where there is no pre-pooled image
    available. Resizes should be similarly faster since the swapchain images for
    the intermediate sizes are never created.
  • Swapchain transients allocations (the attachments for the root render target)
    are also lazily allocated.
  • Swapchain images are pool and reused. The size of the pool is user specified
    (currently 2). If an image in the pool ages past a user supplied value
    (currently 1 second), it is collected as well. Applications that don't render
    frames for a long period of time should see less memory use because of
    swapchain image allocations.
  • The present mode of AHB swapchains behave similar to KHR swapchains but with
    VK_PRESENT_MODE_MAILBOX_KHR. In cases where there is no application managed
    frame pipelining, this might cause images to never be presented if a newer
    image is available. This wasted work can only be avoided by application
    provided pipelining.
  • There are no client side waits during image presentation. Instead, a new type
    of fence is wired up that exports its state as a sync file descriptor. The
    fence signal operation is enqueued on the client side and the buffer is set on
    the surface control. The presentation engine then performs the wait.
  • On Qualcomm devices, Chromium seems to be setting vendor specified flags for
    opting the hardware buffers into using UBWC. AFAICT, this is similar to AFBC
    (and NOT AFRC) on ARM Mali. This has not been wired up since I don't have a
    Qualcomm device at the moment and cant verify bandwidth use using GPU
    counters. I would also like to verify that UBWC is safe to use to images that
    can be used as input attachments.

@chinmaygarde chinmaygarde added the Work in progress (WIP) Not ready (yet) for review! label Apr 12, 2024
@chinmaygarde chinmaygarde force-pushed the ahb_swapchin_neo3 branch 2 times, most recently from 292a1ca to 93beffb Compare April 15, 2024 19:29
@chinmaygarde
Copy link
Member Author

@johnmccutchan @jonahwilliams I am dealing with some minor graphical corruption (only) during image reuse that I'm trying to pin down still. AFAICT, the synchronization is all there but this stuff is extremely poorly documented so its a bit of a crapshoot. Would love to pair on this in a bit if I am not able to make progress on pinning this down.

If you run this locally, you should be able to observe some checkerboarding artifacts on some transitions. Will update this this patchset with fixes once I have them, but the rest of the stuff is good for initial review.

@jonahwilliams
Copy link
Contributor

Happy to take a look this week, I'm really excitied about this change!

@flutter-dashboard
Copy link

Golden file changes have been found for this pull request. Click here to view and triage (e.g. because this is an intentional change).

If you are still iterating on this change and are not ready to resolve the images on the Flutter Gold dashboard, consider marking this PR as a draft pull request above. You will still be able to view image results on the dashboard, commenting will be silenced, and the check will not try to resolve itself until marked ready for review.

Changes reported for pull request #52087 at sha 93beffb

@chinmaygarde
Copy link
Member Author

Some observations. I think there is insufficient synchronization. It is unclear what else other than the external_fence_fd is required at this time.

If I attempt to capture the corruption using the built in video capture features on the device, I see no corruption. I've attached a capture of what it actually looks like to the human eye. Notably, when I run in profile or release modes, I see more corruption which strengthens the theory that I am missing some synchronization.

Human Eye

IMG_2127.MOV

Direct Screen Capture

ScreenCapture.mp4

Copy link
Contributor

@jonahwilliams jonahwilliams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still building this. Just a guess on the synchronization. The fence fd ensures that the compositor doesn't read the surface until we're done writing to it. assuming that is done correctly, the error could be on the acquisition. That is, perhaps we start working on the surface before the compositor is done presenting it.

With the normal vulkan swapchain images I think that should be handled by the layout transitions we add at the start of a given render pass. But its possible that the AHB backed textures synchronization doesn't work the same, or has a bug in its implementation.

@chinmaygarde
Copy link
Member Author

I think I figured it out. The fence signal operation doesn't have any command buffers specified along with it. In the old swapchains, the command buffer has a pipeline barrier waiting for all writes to the texture being completed in the color attachment output stage. There is nothing in this implementation. Adding it now. I need to figure out what the new layout is supposed to be since it can't be present src KHR and we have checks internally for redundant layout modifications.

Momentido.

@chinmaygarde
Copy link
Member Author

That wasn't it. Though I admit I now don't know if the fence signal operation is actually waiting for the useful stuff.

That is, perhaps we start working on the surface before the compositor is done presenting it.

I only recycle the texture after the completion callback has been invoked. Perhaps thats not enough?

Curiously, if I set max_entries in the texture pool to zero, I don't see the artifacts. So I think you are on the right track.

Also, the artifacts is checkerboarding. Not the usual display of old contents or corrupted textures. So perhaps I am messing up the load ops?

@chinmaygarde
Copy link
Member Author

Ooh, another observation; the last frame of an animation is never corrupt. So the hunch that I am recycling too soon makes the most sense right now.

@jonahwilliams
Copy link
Contributor

Unrelated, but if you navigate to a screen with a backdrop filter you'll get a validation error about missing transfer dst:

E/flutter (10103): |         Related Objects: Image [21099628137040630] [ImpellerOnscreenResolve]
E/flutter (10103): |                 Trigger: Validation Error: [ VUID-VkImageMemoryBarrier-oldLayout-01213 ] Object 0: handle = 0x4af60000004af6, name = ImpellerOnscreenResolve, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x4fc60592 | vkCmdPipelineBarrier(): pImageMemoryBarriers[0].oldLayout (VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL) is not compatible with VkImage 0x4af60000004af6[ImpellerOnscreenResolve] usage flags VK_IMAGE_USAGE_SAMPLED_BIT|VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT. The Vulkan spec states: If srcQueueFamilyIndex and dstQueueFamilyIndex define a queue family ownership transfer or oldLayout and newLayout define an image layout transition, and oldLayout or newLayout is VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL then image must have been created with VK_IMAGE_USAGE_TRANSFER_DST_BIT (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-VkImageMemoryBarrier-oldLayout-01213)

@jonahwilliams
Copy link
Contributor

I think if the presentation transitions the image to ePresentSrcKHR, then in theory the barrier we insert in the render_pass from ePresentSrcKHR to eGeneral should handle synchronization.

@chinmaygarde
Copy link
Member Author

Buffers which are replaced or removed from the scene in the transaction invoking this callback may be reused after this point.

This is the docstring for OnComplete. I am definitely recycling to soon. My understanding was that the image was copied out and could be recycled.

if the presentation transitions the image to ePresentSrcKHR

ePresentSrcKHR is part of VK_KHR_Swapchain. I am going to use the general tag for the acquire fence. On the way back, we can switch from general back to either attachment output or sampled.

@jonahwilliams
Copy link
Contributor

Whether or not its part of the swapchain implementation, without a barrier in the render pass we're potentially consuming the swapchain image too soon. maybe we should be transitioning from undefined or something? Not sure, but I suspect because the image attachment never leaves eGeneral we have no sync on the consuming side.

@jonahwilliams
Copy link
Contributor

Ahh yeah, if that gives you a done signal we can use that instead of the barrier

@chinmaygarde
Copy link
Member Author

The flickering is fixed. Taking a look at the barrier (on the swapchain side) and validation error you observed next.

@flutter-dashboard
Copy link

Golden file changes are available for triage from new commit, Click here to view.

Changes reported for pull request #52087 at sha 17eddd2

@chinmaygarde
Copy link
Member Author

The validation errors are fixed and the input attachment case is handled. The barrier next.

@jonahwilliams
Copy link
Contributor

Something I'm noticing when running this locally is that the GC is running way more often then you'd think, we're basically continually dropping + adding AHBs. I think the max surfaces amount is too low.

image

@jonahwilliams
Copy link
Contributor

if I continually pump frames on a Pixel 7 pro, I can get up to 3 entries every so often which will immediately pop the last one. Which we then immediately recreate. We could probably set max images to 3 for now, but maybe we need to be more sophisticated when picking that number

@chinmaygarde
Copy link
Member Author

Thats entirely possible. I also set up it up in a way where the push/pop operations happen at the back of deque so that the buffers in the front have the best chance to age out. I don't have a good intuition for picking the buffer counts. But we could bump up the pool size and disable buffer aging. Though some additional allocations are definitely going to be seen in these swapchains.

@chinmaygarde
Copy link
Member Author

From that graph, we can also move the surface transaction apply call off to a background thread no problem.

@jonahwilliams
Copy link
Contributor

I wouldn't do that immediately at least. There is some odd interaction between the swapchain and hybrid composition rendering on Android that I had to work around in the past: #44881

@chinmaygarde
Copy link
Member Author

Yeah, just going to get these working. Can tune and rethread after. Backing out of the wonders app using the back button leads to device loss. Figuring out why.

@chinmaygarde
Copy link
Member Author

Can we detect that we're running on swiftshader via capabilities/properties check and choose the other swapchain?

Not via Vulkan device properties since those reflect the properties of the device after passthrough. The internet seems to have some guidance but its imperfect. I was hoping that Vulkan would report that device was emulated via VK_PHYSICAL_DEVICE_TYPE_VIRTUAL_GPU but that doesn't seem to be the case for emulators either.

@jonahwilliams
Copy link
Contributor

See https://developer.android.com/ndk/guides/graphics/design-notes#hardware-acceleration

Most devices support Vulkan 1.1 via hardware acceleration while a small subset support it via software emulation. Apps can detect a software-based Vulkan device using vkGetPhysicalDeviceProperties and checking the deviceType field of the returned structure. SwiftShader and other CPU-based implementations have the value VK_PHYSICAL_DEVICE_TYPE_CPU. Apps can check specifically for SwiftShader by checking the vendorID and deviceID fields of this same structure for SwiftShader-specific values.

Maybe VK_PHYSICAL_DEVICE_TYPE_CPU + swiftshader + goog manufacturer id = emulator?

"control.";
return false;
}
return transaction.Apply([signaler, texture, weak = weak_from_this()]() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what thread this runs on, but for hybrid composition today there are some assumptions about flutters presentation, we may need to conditionally block on the presenting thread until we reach some point in presentation, either here or in the worse case later on when it says we're done with the texture.

See #44881 for how I did this originally. Though this code was backed out later so the android code would need to be redone

@jonahwilliams
Copy link
Contributor

If you want to, land this with the emulator workaround (so the tests don't break) and we can handle the hybrid composition bug as a follow up. Or feel free to land it off by default with an ifdef if you'd like to handle that separately.

@chinmaygarde chinmaygarde removed the Work in progress (WIP) Not ready (yet) for review! label Apr 29, 2024
return desc;
}

void* HardwareBuffer::Lock(CPUAccessType type) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we using Lock/unlock or was this only for testing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just for testing. But I'm inclined to leave it in there since it is handy when you need it and get LTOd away when you don't. But your call on if this should be backed out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its fine to leave them

Copy link
Contributor

@jonahwilliams jonahwilliams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we can't run on the engine emulators it would be nice if you could add some sanity checking to the android unit tests (setup /teardown swapchain at least), so that if we completely break it we find out before it rolls into the framework.

@chinmaygarde
Copy link
Member Author

android unit tests

Which target is this?

@jonahwilliams
Copy link
Contributor

The ones you added a month or so back?

@jonahwilliams
Copy link
Contributor

#51479

This wires up Android Hardware Buffer backed swapchains on platform that support
it (Android >= 29) with a KHR swapchain fallback (which can be removed later to
save on some binary size if needed).

Some salient features of these swapchains and their differences with the KHR
variant:

* All swapchain images are guaranteed to R8G8B8A8. This could potentially allow
  for earlier creation of the context and the PSO libraries.
* All swapchain allocations are lazy. This should greatly reduce the time it
  takes to create and resize a swapchain. However, swapchain image acquisition
  may take longer for the first few frame where there is no pre-pooled image
  available. Resizes should be similarly faster since the swapchain images for
  the intermediate sizes are never created.
* Swapchain transients allocations (the attachments for the root render target)
  are also lazily allocated.
* Swapchain images are pool and reused. The size of the pool is user specified
  (currently 2). If an image in the pool ages past a user supplied value
  (currently 1 second), it is collected as well. Applications that don't render
  frames for a long period of time should see less memory use because of
  swapchain image allocations.
* The present mode of AHB swapchains behave similar to KHR swapchains but with
  VK_PRESENT_MODE_MAILBOX_KHR. In cases where there is no application managed
  frame pipelining, this might cause images to never be presented if a newer
  image is available. This wasted work can only be avoided by application
  provided pipelining.
* There are no client side waits during image presentation. Instead, a new type
  of fence is wired up that exports its state as a sync file descriptor. The
  fence signal operation is enqueued on the client side and the buffer is set on
  the surface control. The presentation engine then performs the wait.
* On Qualcomm devices, Chromium seems to be setting vendor specified flags for
  opting the hardware buffers into using UBWC. AFAICT, this is similar to AFBC
  (and NOT AFRC) on ARM Mali. This has not been wired up since I don't have a
  Qualcomm device at the moment and cant verify bandwidth use using GPU
  counters. I would also like to verify that UBWC is safe to use to images that
  can be used as input attachments.
@chinmaygarde
Copy link
Member Author

I haven't wired it up yet to run on actual devices.

Copy link
Contributor

@jonahwilliams jonahwilliams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devicelab will be our integration test then.

LGTM

@chinmaygarde
Copy link
Member Author

Having some trouble with the CI tests failing on unrelated things.

@jonahwilliams
Copy link
Contributor

Gogogogo

@chinmaygarde chinmaygarde added the autosubmit Merge PR when tree becomes green via auto submit App label May 1, 2024
@auto-submit auto-submit bot merged commit 842cf25 into flutter:main May 1, 2024
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request May 1, 2024
auto-submit bot pushed a commit to flutter/flutter that referenced this pull request May 1, 2024
…147670)

flutter/engine@5129b49...842cf25

2024-05-01 [email protected] [Impeller] Wire up hardware buffer backed swapchains on Android. (flutter/engine#52087)
2024-05-01 [email protected] Roll Skia from bfbd738fb97b to 0b31f64bd8d8 (3 revisions) (flutter/engine#52492)
2024-05-01 [email protected] Roll Dart SDK from d946408c559a to f5a429107c87 (2 revisions) (flutter/engine#52487)

If this roll has caused a breakage, revert this CL and stop the roller
using the controls here:
https://autoroll.skia.org/r/flutter-engine-flutter-autoroll
Please CC [email protected],[email protected],[email protected] on the revert to ensure that a human
is aware of the problem.

To file a bug in Flutter: https://github.com/flutter/flutter/issues/new/choose

To report a problem with the AutoRoller itself, please file a bug:
https://issues.skia.org/issues/new?component=1389291&template=1850622

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+doc/main/autoroll/README.md
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

autosubmit Merge PR when tree becomes green via auto submit App e: impeller platform-android will affect goldens

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants