[fbsync] Document ResNet architecture tweak (#5977)

YosuaMichael · datumbox · facebook-github-bot · commit b0e78aa608ce · 2022-06-01T01:47:38.000-07:00
Summary: * To resolve issue #5964 Add note for resnet architecture * Update resnet.py * Update resnet.py * Update resnet.rst * Fix stylings * Add the same notes on model builders * Improve description * Apply the change everywhere * Remove trailing space Reviewed By: NicolasHug Differential Revision: D36760934 fbshipit-source-id: 044ff1d1f35f6354dbc7608a0d30951aa90190a2 Co-authored-by: Vasilis Vryniotis <datumbox@users.noreply.github.com>
diff --git a/docs/source/models/resnet.rst b/docs/source/models/resnet.rst
@@ -6,6 +6,11 @@ ResNet
 The ResNet model is based on the `Deep Residual Learning for Image Recognition
 <https://arxiv.org/abs/1512.03385>`_ paper.
 
+.. note::
+    The bottleneck of TorchVision places the stride for downsampling to the second 3x3
+    convolution while the original paper places it to the first 1x1 convolution.
+    This variant improves the accuracy and is known as `ResNet V1.5
+    <https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch>`_.
 
 Model builders
 --------------
diff --git a/torchvision/models/resnet.py b/torchvision/models/resnet.py
@@ -699,6 +699,12 @@ def resnet34(*, weights: Optional[ResNet34_Weights] = None, progress: bool = Tru
 def resnet50(*, weights: Optional[ResNet50_Weights] = None, progress: bool = True, **kwargs: Any) -> ResNet:
     """ResNet-50 from `Deep Residual Learning for Image Recognition <https://arxiv.org/pdf/1512.03385.pdf>`__.
 
+    .. note::
+       The bottleneck of TorchVision places the stride for downsampling to the second 3x3
+       convolution while the original paper places it to the first 1x1 convolution.
+       This variant improves the accuracy and is known as `ResNet V1.5
+       <https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch>`_.
+
     Args:
         weights (:class:`~torchvision.models.ResNet50_Weights`, optional): The
             pretrained weights to use. See
@@ -724,6 +730,12 @@ def resnet50(*, weights: Optional[ResNet50_Weights] = None, progress: bool = Tru
 def resnet101(*, weights: Optional[ResNet101_Weights] = None, progress: bool = True, **kwargs: Any) -> ResNet:
     """ResNet-101 from `Deep Residual Learning for Image Recognition <https://arxiv.org/pdf/1512.03385.pdf>`__.
 
+    .. note::
+       The bottleneck of TorchVision places the stride for downsampling to the second 3x3
+       convolution while the original paper places it to the first 1x1 convolution.
+       This variant improves the accuracy and is known as `ResNet V1.5
+       <https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch>`_.
+
     Args:
         weights (:class:`~torchvision.models.ResNet101_Weights`, optional): The
             pretrained weights to use. See
@@ -749,6 +761,12 @@ def resnet101(*, weights: Optional[ResNet101_Weights] = None, progress: bool = T
 def resnet152(*, weights: Optional[ResNet152_Weights] = None, progress: bool = True, **kwargs: Any) -> ResNet:
     """ResNet-152 from `Deep Residual Learning for Image Recognition <https://arxiv.org/pdf/1512.03385.pdf>`__.
 
+    .. note::
+       The bottleneck of TorchVision places the stride for downsampling to the second 3x3
+       convolution while the original paper places it to the first 1x1 convolution.
+       This variant improves the accuracy and is known as `ResNet V1.5
+       <https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch>`_.
+
     Args:
         weights (:class:`~torchvision.models.ResNet152_Weights`, optional): The
             pretrained weights to use. See