Skip to content

Conversation

@vfdev-5
Copy link
Contributor

@vfdev-5 vfdev-5 commented Nov 1, 2022

[-------- crop_bounding_box cpu BoundingBoxFormat.XYXY --------]
            |  crop_bounding_box_old v2  |  crop_bounding_box v2
1 threads: -----------------------------------------------------
      (4,)  |             50             |           20         
6 threads: -----------------------------------------------------
      (4,)  |             50             |           20         

Times are in microseconds (us).

[-------- crop_bounding_box cpu BoundingBoxFormat.XYWH --------]
            |  crop_bounding_box_old v2  |  crop_bounding_box v2
1 threads: -----------------------------------------------------
      (4,)  |             95             |           20         
6 threads: -----------------------------------------------------
      (4,)  |            106             |           20         

Times are in microseconds (us).

[------- crop_bounding_box cpu BoundingBoxFormat.CXCYWH -------]
            |  crop_bounding_box_old v2  |  crop_bounding_box v2
1 threads: -----------------------------------------------------
      (4,)  |            100             |           20         
6 threads: -----------------------------------------------------
      (4,)  |            100             |           20         

Times are in microseconds (us).

[------- crop_bounding_box cuda BoundingBoxFormat.XYXY --------]
            |  crop_bounding_box_old v2  |  crop_bounding_box v2
1 threads: -----------------------------------------------------
      (4,)  |            100             |           72         
6 threads: -----------------------------------------------------
      (4,)  |            100             |           71         

Times are in microseconds (us).

[------- crop_bounding_box cuda BoundingBoxFormat.XYWH --------]
            |  crop_bounding_box_old v2  |  crop_bounding_box v2
1 threads: -----------------------------------------------------
      (4,)  |            174             |          71.5        
6 threads: -----------------------------------------------------
      (4,)  |            170             |          70.9        

Times are in microseconds (us).

[------ crop_bounding_box cuda BoundingBoxFormat.CXCYWH -------]
            |  crop_bounding_box_old v2  |  crop_bounding_box v2
1 threads: -----------------------------------------------------
      (4,)  |            260             |           71         
6 threads: -----------------------------------------------------
      (4,)  |            264             |           71         

Times are in microseconds (us).

cc @datumbox @bjuncek @pmeier

Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one minor comment.

sub = torch.tensor([left, top, left, top], device=bounding_box.device)
else:
sub = torch.tensor([left, top, 0, 0], device=bounding_box.device)
bounding_box = bounding_box.sub_(sub)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is important to assign to bounding_box the output of bounding_box.sub_(sub) otherwise, bounding_box is still BoundingBox with wrong spatial_size

@vfdev-5 vfdev-5 merged commit 2ba2f1d into pytorch:main Nov 2, 2022
@vfdev-5 vfdev-5 deleted the proto-speedup-crop-bboxes branch November 2, 2022 17:46
@github-actions
Copy link

github-actions bot commented Nov 2, 2022

Hey @vfdev-5!

You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

@vfdev-5 vfdev-5 added module: transforms Perf For performance improvements prototype labels Nov 2, 2022
bounding_box.clone(), old_format=format, new_format=features.BoundingBoxFormat.XYXY, inplace=True
)

bounding_box = bounding_box.clone()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vasilis found that it is generally faster to do one regular operation instead of cloning and a single inplace operation. @vfdev-5 Have you benchmarked that as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it faster by ~2us to go directly with bounding_box = bounding_box.sub(sub) instead of cloning and sub_.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we have used the idiom Vasilis found everywhere else, I think it would be good to also use it here especially if it is (marginally) faster. Do you want to send a PR or should I?

Copy link
Contributor Author

@vfdev-5 vfdev-5 Nov 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update in pad PR (#6890). Less lines of cleaner code is better then current code :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funny, full benchmark shows ~10us improvements :)

facebook-github-bot pushed a commit that referenced this pull request Nov 4, 2022
Summary:
* [proto] Speed-up crop on bboxes and tests

* Fix linter

* Update _geometry.py

* Fixed device issue

* Revert changes in test/prototype_transforms_kernel_infos.py

* Fixed failing correctness tests

Reviewed By: datumbox

Differential Revision: D41020546

fbshipit-source-id: 0dbc8c900caad4c982fda96b87c98e5d888fe5aa
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants