Skip to content
This repository was archived by the owner on May 31, 2025. It is now read-only.
This repository was archived by the owner on May 31, 2025. It is now read-only.

tf_remap very CPU-intensive #175

@peci1

Description

@peci1

I was wondering what's eating my CPU when replaying bagfiles, and I found out it's tf_remap. In the bagfile, I have TF messages at about 1-2 kHz, and the code I'm running spawns like 10 TF listeners. This is enough to saturate 2 CPU cores on my machine just by running the tf_remap node.

I verified the remapping itself is not the bottleneck - the bottleneck really is the rospy.Publisher.publish(), probably not able to publish at such a high rate from python to that many subscribers.

So I've written an experimental C++ tf_remap, and the CPU load is much lower (about 30-40% of one CPU core). This package also solves a few more pain points of tf_remap, like the inability to work with tf_static, or the inability to remove TF frames.

Now the question is: do we want to merge this package as a tool to geometry2 (not to geometry, since it's TF2-native)? Or should I rather release it as a standalone package? I assume that completely substituting the current Python node with the C++ one is not an option for many reasons...

Another thing after the previous question is solved: should there be a warning added to tf/tf_remap that it's known to be inefficient, and with a link to the C++ node if the user observes performance problems?


Here's the code that can easily achieve the high-load situation I'm describing (note that tf/static_transform_publisher does not use /tf_static, but it periodically publishes to /tf, in this case at 1 kHz):

#!/bin/bash

trap "trap - SIGTERM && kill -- -$$" SIGINT SIGTERM EXIT

roscore -p $(echo $ROS_MASTER_URI | cut -d':' -f3) &

sleep 5

for i in `seq 1 10`; do
	rosrun tf tf_monitor 2>/dev/null 1>/dev/null &
done

rosrun tf static_transform_publisher 0 0 0 0 0 0 a b 1 /tf:=/tf_old &
rosrun tf static_transform_publisher 0 0 0 0 0 0 d e 1 /tf:=/tf_old &

rosrun tf tf_remap _mappings:='[{"old": "b", "new": "c"}]' &

htop -p $(ps aux | grep tf_remap | grep -v grep | sed 's/\s\+/ /g' | cut -d' ' -f2)


For completeness, I attach profiler outputs from profiling the original Python node and the C++ node: profilers.zip . In Python, thread.lock.acquire takes almost all the self time, seconded by write_data with 1.5% . In C++, pthread_mutex_lock is also the most time-consuming call, but still taking only 17% of the self time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions