Skip to content

Conversation

@datumbox
Copy link
Contributor

@datumbox datumbox commented Nov 3, 2022

Some of the opts highlighted at #6872

cc @vfdev-5 @bjuncek @pmeier

Comment on lines +184 to +188
w_ratio = new_width / old_width
h_ratio = new_height / old_height
ratios = torch.tensor([w_ratio, h_ratio, w_ratio, h_ratio], device=bounding_box.device)
return (
bounding_box.reshape(-1, 2, 2).mul(ratios).to(bounding_box.dtype).reshape(bounding_box.shape),
bounding_box.mul(ratios).to(bounding_box.dtype),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improvement:

[------------ resize cpu torch.float32 ------------]
                |        old       |        new     
1 threads: -----------------------------------------
      (128, 4)  |   13 (+-  0) us  |    8 (+-  0) us
6 threads: -----------------------------------------
      (128, 4)  |   13 (+-  0) us  |    8 (+-  0) us

Times are in microseconds (us).

[----------- resize cuda torch.float32 ------------]
                |        old       |        new     
1 threads: -----------------------------------------
      (128, 4)  |   37 (+-  0) us  |   31 (+-  0) us
6 threads: -----------------------------------------
      (128, 4)  |   37 (+-  0) us  |   31 (+-  0) us

Times are in microseconds (us).

[------------- resize cpu torch.uint8 -------------]
                |        old       |        new     
1 threads: -----------------------------------------
      (128, 4)  |   19 (+-  0) us  |   13 (+-  0) us
6 threads: -----------------------------------------
      (128, 4)  |   19 (+-  0) us  |   13 (+-  0) us

Times are in microseconds (us).

[------------ resize cuda torch.uint8 -------------]
                |        old       |        new     
1 threads: -----------------------------------------
      (128, 4)  |   45 (+-  0) us  |   39 (+-  0) us
6 threads: -----------------------------------------
      (128, 4)  |   45 (+-  0) us  |   39 (+-  1) us

Times are in microseconds (us).

@vfdev-5
Copy link
Contributor

vfdev-5 commented Nov 3, 2022

Maybe, we can merge this after #6879

Copy link
Contributor

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice optim for resize, thanks @datumbox

@datumbox
Copy link
Contributor Author

datumbox commented Nov 3, 2022

@vfdev-5 I just pushed a couple of untested opts. Could you check again which you think are safe? I'll do benchmarks after we confirm which ones we want in.

@datumbox datumbox requested a review from vfdev-5 November 3, 2022 11:43
@vfdev-5
Copy link
Contributor

vfdev-5 commented Nov 3, 2022

I'll cherry pick those for elastic those which makes sense. Thanks for pointers!

# Translate bounding boxes
out_bboxes[:, 0::2] = out_bboxes[:, 0::2] - tr[:, 0]
out_bboxes[:, 1::2] = out_bboxes[:, 1::2] - tr[:, 1]
out_bboxes.sub_(tr.repeat((1, 2)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improvement for both changes:

[-------------------- bbox_rotate cpu -------------------]
                     |      False       |       True     
1 threads: ----------------------------------------------
      torch.float32  |  265 (+- 40) us  |  225 (+-  2) us
      torch.float64  |  261 (+-  1) us  |  241 (+-  1) us
      torch.int32    |  258 (+-  1) us  |  239 (+-  2) us
      torch.int64    |  260 (+-  1) us  |  239 (+-  1) us
6 threads: ----------------------------------------------
      torch.float32  |  466 (+- 10) us  |  405 (+- 20) us
      torch.float64  |  483 (+- 10) us  |  422 (+- 55) us
      torch.int32    |  479 (+- 10) us  |  420 (+- 10) us
      torch.int64    |  482 (+- 18) us  |  422 (+- 10) us

Times are in microseconds (us).

[-------------------- bbox_rotate cpu -------------------]
                     |      False       |       True     
1 threads: ----------------------------------------------
      torch.float32  |  498 (+- 46) us  |  432 (+-  0) us
      torch.float64  |  489 (+-  1) us  |  446 (+-  0) us
      torch.int32    |  503 (+-  0) us  |  459 (+-  3) us
      torch.int64    |  504 (+-  3) us  |  458 (+-  0) us
6 threads: ----------------------------------------------
      torch.float32  |  573 (+-  2) us  |  530 (+-  0) us
      torch.float64  |  600 (+- 20) us  |  554 (+- 20) us
      torch.int32    |  609 (+- 20) us  |  560 (+- 10) us
      torch.int64    |  598 (+- 58) us  |  563 (+- 10) us

Times are in microseconds (us).

@datumbox datumbox changed the title [WIP] Remaining BBox kernel perf optimizations Remaining BBox kernel perf optimizations Nov 3, 2022
@datumbox datumbox added module: transforms Perf For performance improvements prototype labels Nov 3, 2022
Copy link
Contributor

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @datumbox

@datumbox datumbox merged commit f1b840d into pytorch:main Nov 3, 2022
@datumbox datumbox deleted the prototype/bbox_speedups branch November 3, 2022 13:07
facebook-github-bot pushed a commit that referenced this pull request Nov 4, 2022
Summary:
* Bbox resize optimization

* Other (untested) optimizations on `_affine_bounding_box_xyxy` and `elastic_bounding_box`.

* fix conflict

* Reverting changes on elastic

* revert one more change

* Further improvement

Reviewed By: datumbox

Differential Revision: D41020550

fbshipit-source-id: dfd1f2d91490b45176f1976bcec1fc99248f8587
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants