`spin_some` easily deadlocks action clients

## Bug report
When running an `rclcpp_action::Client<T>` in using `spin_some()` and a `rclcpp::Rate`, if the server you connect to publishes feedback at a faster rate than your `rclcpp::Rate`, your client waits for the goal to be accepted forever, effectively becoming deadlocked.

**Required Info:**

- Operating System:
  - Ubuntu 20.04
- Installation type:
  - Binaries
- Version or commit hash:
  - Galactic
- DDS implementation:
  - Cyclone DDS
- Client library (if applicable):
  - rclcpp

Note: based on my understanding of the root cause (see `Additional information`), I believe this bug exists in rolling too, but I've only tested on Galactic as that's what I have access to at the moment.

#### Steps to reproduce issue
The below server and client reproduce 100% of the time on my machine when running over loopback. Compilation settings shouldn't matter. Once built, they can be run with `ros2 run ...` or directly with `./path/to/binary`. I'm using `action_tutorials_interfaces/action/Fibonacci`, but this should repro with any action interface.

<details>
<summary>Server</summary>

```cpp
#include <action_tutorials_interfaces/action/fibonacci.hpp>
#include <rclcpp/rclcpp.hpp>
#include <rclcpp_action/rclcpp_action.hpp>

int main(int argc, char *argv[])
{
  rclcpp::init(argc, argv);
  auto node = rclcpp::Node::make_shared("fast_feedback_server");
  using Action = action_tutorials_interfaces::action::Fibonacci;
  std::shared_ptr<rclcpp_action::ServerGoalHandle<Action>> goal_handle = nullptr;

  auto server = rclcpp_action::create_server<Action>(
      node, "fast_feedback",
      [&](const rclcpp_action::GoalUUID &, std::shared_ptr<const Action::Goal>)
      { return rclcpp_action::GoalResponse::ACCEPT_AND_EXECUTE; },
      [&](const std::shared_ptr<rclcpp_action::ServerGoalHandle<Action>>)
      {
        RCLCPP_INFO(node->get_logger(), "Cancellation requested, accepting.");
        goal_handle = nullptr;
        return rclcpp_action::CancelResponse::ACCEPT;
      },
      [&](const std::shared_ptr<rclcpp_action::ServerGoalHandle<Action>> gh)
      {
        RCLCPP_INFO(node->get_logger(), "Goal was accepted");
        goal_handle = gh;
      });

  rclcpp::Rate rate(100.0);  // 100 Hz
  rclcpp::executors::SingleThreadedExecutor executor;
  executor.add_node(node);
  RCLCPP_INFO(node->get_logger(), "Server spinning, ready to accept goals.");
  while (rclcpp::ok())
  {
    executor.spin_some();

    // NOTE: this server intentionally does not send a result
    if (goal_handle)
    {
      goal_handle->publish_feedback(std::make_unique<Action::Feedback>());
    }

    rate.sleep();
  }

  return 0;
}
```

Notes:

1. The server spins at 100 Hz.
2. Feedback is published at the same rate.
3. The goal is never completed as it's not relevant to this bug.

</details>

<details>
<summary>Client</summary>

```cpp
#include <action_tutorials_interfaces/action/fibonacci.hpp>
#include <rclcpp/rclcpp.hpp>
#include <rclcpp_action/rclcpp_action.hpp>

int main(int argc, char *argv[])
{
  rclcpp::init(argc, argv);
  auto node = rclcpp::Node::make_shared("fast_feedback_client");
  using Action = action_tutorials_interfaces::action::Fibonacci;
  using Client = rclcpp_action::Client<Action>;
  auto client = rclcpp_action::create_client<Action>(node, "fast_feedback");
  if (!client->wait_for_action_server())
  {
    RCLCPP_ERROR(node->get_logger(), "Action server not available after waiting");
    return 1;
  }

  // NOTE: no feedback_callback is set
  Client::SendGoalOptions send_goal_options;
  bool goal_response = false;
  send_goal_options.goal_response_callback = [&](std::shared_future<Client::GoalHandle::SharedPtr>)
  {
    RCLCPP_INFO(node->get_logger(), "Goal response.");
    goal_response = true;
  };

  RCLCPP_INFO(node->get_logger(), "Sending goal.");
  auto future = client->async_send_goal(Action::Goal(), send_goal_options);
  rclcpp::Rate rate(10.0);  // 10 Hz, 10% of the server's rate
  rclcpp::executors::SingleThreadedExecutor executor;
  executor.add_node(node);
  while (rclcpp::ok() && !goal_response)
  {
    RCLCPP_INFO_THROTTLE(node->get_logger(), *node->get_clock(), 10000, "Waiting for goal response.");
    executor.spin_some();
    rate.sleep();
  }

  RCLCPP_INFO_STREAM(node->get_logger(), "Goal response received: " << std::boolalpha << goal_response);
  return 0;
}
```

Notes

1. The client's rate in this example is 10% of the server's rate to make the repro foolproof for different machines/network conditions, but I've seen the bug reproduce reliably when the client is >90% of the server's rate, and even repro most of the time when they have the same rate.
2. No feedback callback is set on `SendGoalOptions` as the deadlock is not dependent on one.
3. The client exits once the goal is accepted since that's all that's relevant to this bug.

</details>

#### Expected behavior
The action client will accept the goal, perhaps after working through a short backlog of feedback up to the queue depth in the feedback subscription's QoS.

#### Actual behavior
The server has accepted and started working on the goal but the client never sees that. The client continuously logs `Waiting for goal response.` in the repro and debug logs have `Received feedback for unknown goal. Ignoring...`. Debug logs show `Client in wait set is ready`, which I presume to be the goal response, but it's never taken, so the client deadlocks.

#### Additional information
I did some digging and think I have tracked down the root cause, some other contributing factors, and some related impacts.

`rclcpp_action::Client<T>` derives from `rclcpp_action::ClientBase` derives from `rclcpp::Waitable`, and these are scheduled for work in an `rclcpp::Executor` as a single executable, not multiple executables of their constituent parts. When using `rclcpp::Executor::spin_some`, which only collects work from ready entities once per call, the `rclcpp::Waitable` is scheduled once, regardless of the amount of work it has ready. This *should* be fine, but the `rclcpp::ClientBase::take_data` implementation only yields a single executable per call and therefore the subsequent `execute` invoked by `rclcpp::Executor` only performs one thing on the client rather than everything that's ready.

https://github.com/ros2/rclcpp/blob/5e14a283b67940526a20c365b827d8090a552bad/rclcpp_action/src/client.cpp#L550-L687

In *most* scenarios, this is fine, but it deadlocks in this specific scenario: a server publishing feedback faster than the client is spinning (specifically, faster than it calls `rclcpp::Executor::wait_for_work`). This can happen with any of the `spin*` implementations but happens most readily with `spin_some`. Why? Because `rclcpp::ClientBase::take_data` only returns the first ready executable and it prioritizes feedback over responses. The implementation of `rclcpp::ClientBase::take_data` and `execute` take and execute in this order: feedback, status, goal response, result response, cancel response.

When you combine the implementation of `rclcpp::Waitable` that only returns execution data representing a single unit of work, the implicit prioritization mechanism built into `ClientBase`, and an action server whose feedback topic publishes faster than the spin rate of the client, you get a client that always has ready feedback for an unknown goal, ironically never executing the goal response that feedback is for.

Some ancillary effects of this: the goal response is never taken, the client never creates the goal handle, and therefore cancellation isn't easy. Cancellation can be done through `rclcpp_action::Client<T>::async_cancel_all_goals` but that may have adverse side effects depending on your scenario. Additionally, the client never becomes "result aware" for the goal since that occurs when the goal response is processed, so result responses are permanently lost if they occur during the deadlock. Feedback is also always dropped because it's for an unknown goal.

	std::shared_ptr<void>
	ClientBase::take_data()
	{
	if (pimpl_->is_feedback_ready) {
	std::shared_ptr<void> feedback_message = this->create_feedback_message();
	rcl_ret_t ret = rcl_action_take_feedback(
	pimpl_->client_handle.get(), feedback_message.get());
	return std::static_pointer_cast<void>(
	std::make_shared<std::tuple<rcl_ret_t, std::shared_ptr<void>>>(
	ret, feedback_message));
	} else if (pimpl_->is_status_ready) {
	std::shared_ptr<void> status_message = this->create_status_message();
	rcl_ret_t ret = rcl_action_take_status(
	pimpl_->client_handle.get(), status_message.get());
	return std::static_pointer_cast<void>(
	std::make_shared<std::tuple<rcl_ret_t, std::shared_ptr<void>>>(
	ret, status_message));
	} else if (pimpl_->is_goal_response_ready) {
	rmw_request_id_t response_header;
	std::shared_ptr<void> goal_response = this->create_goal_response();
	rcl_ret_t ret = rcl_action_take_goal_response(
	pimpl_->client_handle.get(), &response_header, goal_response.get());
	return std::static_pointer_cast<void>(
	std::make_shared<std::tuple<rcl_ret_t, rmw_request_id_t, std::shared_ptr<void>>>(
	ret, response_header, goal_response));
	} else if (pimpl_->is_result_response_ready) {
	rmw_request_id_t response_header;
	std::shared_ptr<void> result_response = this->create_result_response();
	rcl_ret_t ret = rcl_action_take_result_response(
	pimpl_->client_handle.get(), &response_header, result_response.get());
	return std::static_pointer_cast<void>(
	std::make_shared<std::tuple<rcl_ret_t, rmw_request_id_t, std::shared_ptr<void>>>(
	ret, response_header, result_response));
	} else if (pimpl_->is_cancel_response_ready) {
	rmw_request_id_t response_header;
	std::shared_ptr<void> cancel_response = this->create_cancel_response();
	rcl_ret_t ret = rcl_action_take_cancel_response(
	pimpl_->client_handle.get(), &response_header, cancel_response.get());
	return std::static_pointer_cast<void>(
	std::make_shared<std::tuple<rcl_ret_t, rmw_request_id_t, std::shared_ptr<void>>>(
	ret, response_header, cancel_response));
	} else {
	throw std::runtime_error("Taking data from action client but nothing is ready");
	}
	}

	std::shared_ptr<void>
	ClientBase::take_data_by_entity_id(size_t id)
	{
	// Mark as ready the entity from which we want to take data
	switch (static_cast<EntityType>(id)) {
	case EntityType::GoalClient:
	pimpl_->is_goal_response_ready = true;
	break;
	case EntityType::ResultClient:
	pimpl_->is_result_response_ready = true;
	break;
	case EntityType::CancelClient:
	pimpl_->is_cancel_response_ready = true;
	break;
	case EntityType::FeedbackSubscription:
	pimpl_->is_feedback_ready = true;
	break;
	case EntityType::StatusSubscription:
	pimpl_->is_status_ready = true;
	break;
	}

	return take_data();
	}

	void
	ClientBase::execute(std::shared_ptr<void> & data)
	{
	if (!data) {
	throw std::runtime_error("'data' is empty");
	}

	if (pimpl_->is_feedback_ready) {
	auto shared_ptr = std::static_pointer_cast<std::tuple<rcl_ret_t, std::shared_ptr<void>>>(data);
	auto ret = std::get<0>(*shared_ptr);
	pimpl_->is_feedback_ready = false;
	if (RCL_RET_OK == ret) {
	auto feedback_message = std::get<1>(*shared_ptr);
	this->handle_feedback_message(feedback_message);
	} else if (RCL_RET_ACTION_CLIENT_TAKE_FAILED != ret) {
	rclcpp::exceptions::throw_from_rcl_error(ret, "error taking feedback");
	}
	} else if (pimpl_->is_status_ready) {
	auto shared_ptr = std::static_pointer_cast<std::tuple<rcl_ret_t, std::shared_ptr<void>>>(data);
	auto ret = std::get<0>(*shared_ptr);
	pimpl_->is_status_ready = false;
	if (RCL_RET_OK == ret) {
	auto status_message = std::get<1>(*shared_ptr);
	this->handle_status_message(status_message);
	} else if (RCL_RET_ACTION_CLIENT_TAKE_FAILED != ret) {
	rclcpp::exceptions::throw_from_rcl_error(ret, "error taking status");
	}
	} else if (pimpl_->is_goal_response_ready) {
	auto shared_ptr = std::static_pointer_cast<
	std::tuple<rcl_ret_t, rmw_request_id_t, std::shared_ptr<void>>>(data);
	auto ret = std::get<0>(*shared_ptr);
	pimpl_->is_goal_response_ready = false;
	if (RCL_RET_OK == ret) {
	auto response_header = std::get<1>(*shared_ptr);
	auto goal_response = std::get<2>(*shared_ptr);
	this->handle_goal_response(response_header, goal_response);
	} else if (RCL_RET_ACTION_CLIENT_TAKE_FAILED != ret) {
	rclcpp::exceptions::throw_from_rcl_error(ret, "error taking goal response");
	}
	} else if (pimpl_->is_result_response_ready) {
	auto shared_ptr = std::static_pointer_cast<
	std::tuple<rcl_ret_t, rmw_request_id_t, std::shared_ptr<void>>>(data);
	auto ret = std::get<0>(*shared_ptr);
	pimpl_->is_result_response_ready = false;
	if (RCL_RET_OK == ret) {
	auto response_header = std::get<1>(*shared_ptr);
	auto result_response = std::get<2>(*shared_ptr);
	this->handle_result_response(response_header, result_response);
	} else if (RCL_RET_ACTION_CLIENT_TAKE_FAILED != ret) {
	rclcpp::exceptions::throw_from_rcl_error(ret, "error taking result response");
	}
	} else if (pimpl_->is_cancel_response_ready) {
	auto shared_ptr = std::static_pointer_cast<
	std::tuple<rcl_ret_t, rmw_request_id_t, std::shared_ptr<void>>>(data);
	auto ret = std::get<0>(*shared_ptr);
	pimpl_->is_cancel_response_ready = false;
	if (RCL_RET_OK == ret) {
	auto response_header = std::get<1>(*shared_ptr);
	auto cancel_response = std::get<2>(*shared_ptr);
	this->handle_cancel_response(response_header, cancel_response);
	} else if (RCL_RET_ACTION_CLIENT_TAKE_FAILED != ret) {
	rclcpp::exceptions::throw_from_rcl_error(ret, "error taking cancel response");
	}
	} else {
	throw std::runtime_error("Executing action client but nothing is ready");
	}
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`spin_some` easily deadlocks action clients #2451

Bug report

Steps to reproduce issue

Expected behavior

Actual behavior

Additional information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

spin_some easily deadlocks action clients #2451

Description

Bug report

Steps to reproduce issue

Expected behavior

Actual behavior

Additional information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`spin_some` easily deadlocks action clients #2451