Skip to content
This repository was archived by the owner on Feb 25, 2025. It is now read-only.

Conversation

@gaaclarke
Copy link
Member

@gaaclarke gaaclarke commented May 4, 2023

An alternative to #41527

This introduces a new protocol called DeviceHolder whose only purpose is to return a pointer to a device. That way people can hold onto weak_ptr's to the DeviceHolder and know to abort further work once the Device is deleted (like loading pipelines on a background thread). This PR also changes the return type of ContextVK::GetDevice from returning a reference to a pointer as a hope to better communicate what is happening and discourage copying of the vk::Device.

I would actually recommend abolishing holding onto instance variables of the vk::Device since it isn't clear that that is just a raw pointer. That's sort of how we got into this problem.

fixes flutter/flutter#125522
fixes flutter/flutter#125519
fixes flutter/flutter#125571

Testing: Fixes flake in existing tests.

Pre-launch Checklist

  • I read the Contributor Guide and followed the process outlined there for submitting PRs.
  • I read the Tree Hygiene wiki page, which explains my responsibilities.
  • I read and followed the Flutter Style Guide and the C++, Objective-C, Java style guides.
  • I listed at least one issue that this PR fixes in the description above.
  • I added new tests to check the change I am making or feature I am adding, or Hixie said the PR is test-exempt. See testing the engine for instructions on writing and running engine tests.
  • I updated/added relevant documentation (doc comments with ///).
  • I signed the CLA.
  • All existing and new tests are passing.

If you need help, consider asking for advice on the #hackers-new channel on Discord.

class DeviceHolder {
public:
virtual ~DeviceHolder() = default;
virtual const vk::Device* GetDevice() const = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeviceHolder::GetDevice() should return a vk::Device (instead of a vk::Device*). And code that uses a vk::Device should pass instances (or const references) rather than pointers.

Passing handle wrappers like vk::Device by value is consistent with how the vulkan-hpp API itself uses these objects.

IIUC the intent of this change was to make the C++ type reflect the ownership semantics. But developers of this subsystem are going to need to understand the behavior of the Khronos Vulkan C++ wrappers and that a vk::Device instance is equivalent to a Vulkan device handle.

The const vk::Device* inaccurately implies that the vk::Device is nullable, and creates the potential for someone to be holding onto a vk::Device* pointing to a C++ object that has been deleted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I went with this approach is that the whole reason we got into this mess is that people were copying the vk::Device and not realize what they actually had is a raw pointer that may be invalid. By passing around a std::weak_ptr<DeviceHolder> which returns a pointer to the device, my hope is that this would make that error much less likely.

I can make DeviceHolder return a const vk::Device& but that is going to make it so much easier for someone to copy the vk::Device without realizing the semantics of it since vk::Device has a copy constructor.

Alternatively, I might be able to do virtual const vk::UniqueDevice& GetDevice() const = 0. Which would give you the guarantees you want and make accidentally messing things up less likely.

Here's where I want to be: no one has an lvalue to a vk::Device.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If other modules in this subsystem are interacting directly with Vulkan devices, then they will need to understand how Vulkan devices work and how their lifetime is managed. For now I think the most idiomatic option is having DeviceHolder::GetDevice return a const vk::Device& and establishing a convention that anyone who may need the Vulkan device holds a weak_ptr<DeviceHolder> and should never hold a raw vk::Device instance.

The type system will not be able to prevent all forms of misuse as long as the other modules require direct access to the vk::Device. If the convention above is insufficient, then we should hide all usage of the vk::Device behind some other interface.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a loaded gun allowing vk::Device&. I'd be down with making our own wrapper in the future. For now I put in the references as requested.

explicit PipelineCacheVK(std::shared_ptr<const Capabilities> caps,
vk::Device device,
std::weak_ptr<DeviceHolder> device_holder,
const vk::Device* device,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the device - it's redundant with the device_holder

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not redundant unfortunately. The way we initialize the ContextVK the DeviceHolder is not valid at the time we call PipelineCacheVK(). It's because in the ContextVK constructor all data is held on the stack and only if everything is successful are the ivar's written to and is_valid_ is set to true.

Our options were:

  1. Send in the Device and avoid calling into the DeviceHolder
  2. Stop using the Device in the PipelineCacheVK constructor
  3. Write the Device to the ivar of ContextVK but make a RAII wrapper that unsets it if we don't get to is_valid_ = true.

I went for the first option. I can add a comment to make it more clear. The second option is probably a good design but is a risky change (I'd rather do that as a followup if we wanted to). The 3rd option seemed to have just as much baggage as "don't use the DeviceHolder if you have the Device".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment SGTM. Later we can look into ways to make construction of PipelineCacheVK cleaner.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment added.

}

void CommandPoolVK::ClearAllPools(const ContextVK* context) {
if (tls_command_pool.get()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this - it's unrelated to the rest of this PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, i'll file a new PR. It was related in that the question was "will we be able to solve the leak once we go down this route, since we know we can with the alternative route". So the answer is yes.

@gaaclarke gaaclarke requested a review from jason-simmons May 8, 2023 16:26
@gaaclarke
Copy link
Member Author

@jason-simmons PTAL

@gaaclarke gaaclarke added the autosubmit Merge PR when tree becomes green via auto submit App label May 9, 2023
@auto-submit auto-submit bot merged commit d596757 into flutter:main May 9, 2023
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request May 9, 2023
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request May 9, 2023
auto-submit bot pushed a commit to flutter/flutter that referenced this pull request May 9, 2023
…126309)

flutter/engine@8d3a816...824cd09

2023-05-09 [email protected] Increase timeout of orchestrator. (flutter/engine#41839)
2023-05-09 [email protected] [fuchsia] Stop calling FIDL from Dart in Flutter integration tests (flutter/engine#41669)
2023-05-09 [email protected] [tests] Remove unused fuchsia.sys protocol reference (flutter/engine#41826)
2023-05-09 [email protected] Roll Skia from 7736fbaf84f0 to 485cd3d0f9ca (6 revisions) (flutter/engine#41840)
2023-05-09 [email protected] [Impeller] introduces DeviceHolder to avoid accessing a dead Device (flutter/engine#41748)
2023-05-08 [email protected] Get rid of "outrageous" default text styles for HTML renderer. (flutter/engine#41822)
2023-05-08 [email protected] Adjust DL filter bounds tests to not rely on exact Skia results (flutter/engine#41792)
2023-05-08 [email protected] Roll Dart SDK from a8b6687327d6 to 498cfa57165b (1 revision) (flutter/engine#41823)
2023-05-08 [email protected] [Android] Fix incorrect viewInsets during keyboard animation with EdgeToEdge (flutter/engine#39391)
2023-05-08 [email protected] Roll Skia from 951eb9653163 to 7736fbaf84f0 (1 revision) (flutter/engine#41821)

If this roll has caused a breakage, revert this CL and stop the roller
using the controls here:
https://autoroll.skia.org/r/flutter-engine-flutter-autoroll
Please CC [email protected],[email protected],[email protected] on the revert to ensure that a human
is aware of the problem.

To file a bug in Flutter: https://github.com/flutter/flutter/issues/new/choose

To report a problem with the AutoRoller itself, please file a bug:
https://bugs.chromium.org/p/skia/issues/entry?template=Autoroller+Bug

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+doc/main/autoroll/README.md
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

autosubmit Merge PR when tree becomes green via auto submit App

Projects

None yet

2 participants