Skip to content

Conversation

@datumbox
Copy link
Contributor

Fixing #2824

When running the tests of ReminaNet on GPU, the outputs are checked twice:

check_out(out)

vision/test/test_models.py

Lines 210 to 215 in 5e33cc8

if dev == "cuda":
with torch.cuda.amp.autocast():
out = model(model_input)
# See autocast_flaky_numerics comment at top of file.
if name not in autocast_flaky_numerics:
check_out(out)

On the first check, the values are very close:

expected: 
[{'boxes': tensor([[ 60.6603, 115.2130,  86.0578, 129.6661],
        [ 27.7266,  42.5258,  92.2687, 115.0305],
        [ 28.3789, 138.9524,  91.5207, 211.4824]], requires_grad=True), 'scores': tensor([0.0130, 0.0130, 0.0130], requires_grad=True), 'labels': tensor([2, 4, 4])}]

output: 
[{'boxes': tensor([[ 60.6603, 115.2130,  86.0578, 129.6661],
        [ 27.7266,  42.5258,  92.2687, 115.0306],
        [ 28.3788, 138.9524,  91.5207, 211.4824]], device='cuda:0',
       grad_fn=<StackBackward>), 'scores': tensor([0.0130, 0.0130, 0.0130], device='cuda:0', grad_fn=<CatBackward>), 'labels': tensor([2, 4, 4], device='cuda:0')}]

On the second that uses autocast() are not:

expected: 
[{'boxes': tensor([[ 60.6603, 115.2130,  86.0578, 129.6661],
        [ 27.7266,  42.5258,  92.2687, 115.0305],
        [ 28.3789, 138.9524,  91.5207, 211.4824]], requires_grad=True), 'scores': tensor([0.0130, 0.0130, 0.0130], requires_grad=True), 'labels': tensor([2, 4, 4])}]

output: 
[{'boxes': tensor([[ 60.6603, 115.2104,  86.0584, 129.6646],
        [ 54.6808,  85.3325,  80.0379,  99.6362],
        [ 28.3769, 138.9522,  91.5294, 211.4853],
        [ 27.7301,  42.4879,  92.2699, 115.0121]], device='cuda:0',
       grad_fn=<StackBackward>), 'scores': tensor([0.0130, 0.0130, 0.0130, 0.0130], device='cuda:0', dtype=torch.float16,
       grad_fn=<CatBackward>), 'labels': tensor([2, 2, 4, 4], device='cuda:0')}]

I thought it would be better to partially test the results on GPU, so I decided to re-enable it and include it in the autocast_flaky_numerics list.

@vfdev-5 vfdev-5 changed the title Re-enable ReminaNet unit-tests on GPU Re-enable RetinaNet unit-tests on GPU Oct 16, 2020
@codecov
Copy link

codecov bot commented Oct 16, 2020

Codecov Report

Merging #2825 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #2825   +/-   ##
=======================================
  Coverage   73.26%   73.26%           
=======================================
  Files          99       99           
  Lines        8778     8778           
  Branches     1387     1387           
=======================================
  Hits         6431     6431           
  Misses       1920     1920           
  Partials      427      427           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update adfc15c...b9c92e8. Read the comment docs.

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks a lot for the investigation!

@fmassa fmassa merged commit b480903 into pytorch:master Oct 16, 2020
@datumbox datumbox deleted the enhancemen/enable_retina_gputests branch October 16, 2020 18:42
bryant1410 pushed a commit to bryant1410/vision-1 that referenced this pull request Nov 22, 2020
vfdev-5 pushed a commit to Quansight/vision that referenced this pull request Dec 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants