-
Notifications
You must be signed in to change notification settings - Fork 6k
[Impeller] introduces DeviceHolder to avoid accessing a dead Device #41748
[Impeller] introduces DeviceHolder to avoid accessing a dead Device #41748
Conversation
| class DeviceHolder { | ||
| public: | ||
| virtual ~DeviceHolder() = default; | ||
| virtual const vk::Device* GetDevice() const = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeviceHolder::GetDevice() should return a vk::Device (instead of a vk::Device*). And code that uses a vk::Device should pass instances (or const references) rather than pointers.
Passing handle wrappers like vk::Device by value is consistent with how the vulkan-hpp API itself uses these objects.
IIUC the intent of this change was to make the C++ type reflect the ownership semantics. But developers of this subsystem are going to need to understand the behavior of the Khronos Vulkan C++ wrappers and that a vk::Device instance is equivalent to a Vulkan device handle.
The const vk::Device* inaccurately implies that the vk::Device is nullable, and creates the potential for someone to be holding onto a vk::Device* pointing to a C++ object that has been deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I went with this approach is that the whole reason we got into this mess is that people were copying the vk::Device and not realize what they actually had is a raw pointer that may be invalid. By passing around a std::weak_ptr<DeviceHolder> which returns a pointer to the device, my hope is that this would make that error much less likely.
I can make DeviceHolder return a const vk::Device& but that is going to make it so much easier for someone to copy the vk::Device without realizing the semantics of it since vk::Device has a copy constructor.
Alternatively, I might be able to do virtual const vk::UniqueDevice& GetDevice() const = 0. Which would give you the guarantees you want and make accidentally messing things up less likely.
Here's where I want to be: no one has an lvalue to a vk::Device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If other modules in this subsystem are interacting directly with Vulkan devices, then they will need to understand how Vulkan devices work and how their lifetime is managed. For now I think the most idiomatic option is having DeviceHolder::GetDevice return a const vk::Device& and establishing a convention that anyone who may need the Vulkan device holds a weak_ptr<DeviceHolder> and should never hold a raw vk::Device instance.
The type system will not be able to prevent all forms of misuse as long as the other modules require direct access to the vk::Device. If the convention above is insufficient, then we should hide all usage of the vk::Device behind some other interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a loaded gun allowing vk::Device&. I'd be down with making our own wrapper in the future. For now I put in the references as requested.
| explicit PipelineCacheVK(std::shared_ptr<const Capabilities> caps, | ||
| vk::Device device, | ||
| std::weak_ptr<DeviceHolder> device_holder, | ||
| const vk::Device* device, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the device - it's redundant with the device_holder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not redundant unfortunately. The way we initialize the ContextVK the DeviceHolder is not valid at the time we call PipelineCacheVK(). It's because in the ContextVK constructor all data is held on the stack and only if everything is successful are the ivar's written to and is_valid_ is set to true.
Our options were:
- Send in the Device and avoid calling into the DeviceHolder
- Stop using the Device in the PipelineCacheVK constructor
- Write the Device to the ivar of ContextVK but make a RAII wrapper that unsets it if we don't get to
is_valid_ = true.
I went for the first option. I can add a comment to make it more clear. The second option is probably a good design but is a risky change (I'd rather do that as a followup if we wanted to). The 3rd option seemed to have just as much baggage as "don't use the DeviceHolder if you have the Device".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment SGTM. Later we can look into ways to make construction of PipelineCacheVK cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment added.
| } | ||
|
|
||
| void CommandPoolVK::ClearAllPools(const ContextVK* context) { | ||
| if (tls_command_pool.get()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this - it's unrelated to the rest of this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, i'll file a new PR. It was related in that the question was "will we be able to solve the leak once we go down this route, since we know we can with the alternative route". So the answer is yes.
This reverts commit 48eacc4.
|
@jason-simmons PTAL |
…126309) flutter/engine@8d3a816...824cd09 2023-05-09 [email protected] Increase timeout of orchestrator. (flutter/engine#41839) 2023-05-09 [email protected] [fuchsia] Stop calling FIDL from Dart in Flutter integration tests (flutter/engine#41669) 2023-05-09 [email protected] [tests] Remove unused fuchsia.sys protocol reference (flutter/engine#41826) 2023-05-09 [email protected] Roll Skia from 7736fbaf84f0 to 485cd3d0f9ca (6 revisions) (flutter/engine#41840) 2023-05-09 [email protected] [Impeller] introduces DeviceHolder to avoid accessing a dead Device (flutter/engine#41748) 2023-05-08 [email protected] Get rid of "outrageous" default text styles for HTML renderer. (flutter/engine#41822) 2023-05-08 [email protected] Adjust DL filter bounds tests to not rely on exact Skia results (flutter/engine#41792) 2023-05-08 [email protected] Roll Dart SDK from a8b6687327d6 to 498cfa57165b (1 revision) (flutter/engine#41823) 2023-05-08 [email protected] [Android] Fix incorrect viewInsets during keyboard animation with EdgeToEdge (flutter/engine#39391) 2023-05-08 [email protected] Roll Skia from 951eb9653163 to 7736fbaf84f0 (1 revision) (flutter/engine#41821) If this roll has caused a breakage, revert this CL and stop the roller using the controls here: https://autoroll.skia.org/r/flutter-engine-flutter-autoroll Please CC [email protected],[email protected],[email protected] on the revert to ensure that a human is aware of the problem. To file a bug in Flutter: https://github.com/flutter/flutter/issues/new/choose To report a problem with the AutoRoller itself, please file a bug: https://bugs.chromium.org/p/skia/issues/entry?template=Autoroller+Bug Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+doc/main/autoroll/README.md
An alternative to #41527
This introduces a new protocol called
DeviceHolderwhose only purpose is to return a pointer to a device. That way people can hold onto weak_ptr's to theDeviceHolderand know to abort further work once the Device is deleted (like loading pipelines on a background thread). This PR also changes the return type of ContextVK::GetDevice from returning a reference to a pointer as a hope to better communicate what is happening and discourage copying of the vk::Device.I would actually recommend abolishing holding onto instance variables of the vk::Device since it isn't clear that that is just a raw pointer. That's sort of how we got into this problem.
fixes flutter/flutter#125522
fixes flutter/flutter#125519
fixes flutter/flutter#125571Testing: Fixes flake in existing tests.
Pre-launch Checklist
///).If you need help, consider asking for advice on the #hackers-new channel on Discord.