Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@

sys.path.append(os.path.abspath("."))

torchvision.disable_beta_transforms_warning()

# -- General configuration ------------------------------------------------

# Required version of sphinx is set from docs/requirements.txt
Expand Down
40 changes: 40 additions & 0 deletions docs/source/transforms.rst
Original file line number Diff line number Diff line change
Expand Up @@ -98,17 +98,29 @@ Geometry
:template: class.rst

Resize
v2.Resize
RandomCrop
v2.RandomCrop
RandomResizedCrop
v2.RandomResizedCrop
CenterCrop
v2.CenterCrop
FiveCrop
v2.FiveCrop
TenCrop
v2.TenCrop
Pad
v2.Pad
RandomAffine
v2.RandomAffine
RandomPerspective
v2.RandomPerspective
RandomRotation
v2.RandomRotation
RandomHorizontalFlip
v2.RandomHorizontalFlip
RandomVerticalFlip
v2.RandomVerticalFlip

Color
-----
Expand All @@ -118,15 +130,25 @@ Color
:template: class.rst

ColorJitter
v2.ColorJitter
Grayscale
v2.Grayscale
RandomGrayscale
v2.RandomGrayscale
GaussianBlur
v2.GaussianBlur
RandomInvert
v2.RandomInvert
RandomPosterize
v2.RandomPosterize
RandomSolarize
v2.RandomSolarize
RandomAdjustSharpness
v2.RandomAdjustSharpness
RandomAutocontrast
v2.RandomAutocontrast
RandomEqualize
v2.RandomEqualize

Composition
-----------
Expand All @@ -136,9 +158,13 @@ Composition
:template: class.rst

Compose
v2.Compose
RandomApply
v2.RandomApply
RandomChoice
v2.RandomChoice
RandomOrder
v2.RandomOrder

Miscellaneous
-------------
Expand All @@ -148,9 +174,13 @@ Miscellaneous
:template: class.rst

LinearTransformation
v2.LinearTransformation
Normalize
v2.Normalize
RandomErasing
v2.RandomErasing
Lambda
v2.Lambda

.. _conversion_transforms:

Expand All @@ -162,9 +192,15 @@ Conversion
:template: class.rst

ToPILImage
v2.ToPILImage
v2.ToImagePIL
ToTensor
v2.ToTensor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is deprecated, should we document it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that ToImageTensor is missing from the list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is deprecated, should we document it?

Some thoughts here https://github.com/pytorch/vision/pull/7297/files#r1113244517

It seems that ToImageTensor is missing from the list.

I did not add any new v2 transforms here, so that's normal

PILToTensor
v2.PILToTensor
ConvertImageDtype
v2.ConvertImageDtype
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. Just an alias for ConvertDtype

v2.ConvertDtype

Auto-Augmentation
-----------------
Expand All @@ -181,9 +217,13 @@ The new transform can be used standalone or mixed-and-matched with existing tran

AutoAugmentPolicy
AutoAugment
v2.AutoAugment
RandAugment
v2.RandAugment
TrivialAugmentWide
v2.TrivialAugmentWide
AugMix
v2.AugMix

.. _functional_transforms:

Expand Down
32 changes: 32 additions & 0 deletions torchvision/transforms/v2/_augment.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,38 @@


class RandomErasing(_RandomApplyTransform):
"""[BETA] Randomly selects a rectangle region in the input image or video and erases its pixels.

.. betastatus:: RandomErasing transform

This transform does not support PIL Image.
'Random Erasing Data Augmentation' by Zhong et al. See https://arxiv.org/abs/1708.04896

Args:
p: probability that the random erasing operation will be performed.
scale: range of proportion of erased area against input image.
ratio: range of aspect ratio of erased area.
value: erasing value. Default is 0. If a single int, it is used to
erase all pixels. If a tuple of length 3, it is used to erase
R, G, B channels respectively.
If a str of 'random', erasing each pixel with random values.
inplace: boolean to make this transform inplace. Default set to False.

Returns:
Erased input.

Example:
>>> from torchvision.transforms import v2 as transforms
>>>
>>> transform = transforms.Compose([
>>> transforms.RandomHorizontalFlip(),
>>> transforms.PILToTensor(),
>>> transforms.ConvertImageDtype(torch.float),
>>> transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
>>> transforms.RandomErasing(),
>>> ])
"""

_v1_transform_cls = _transforms.RandomErasing

def _extract_params_for_v1_transform(self) -> Dict[str, Any]:
Expand Down
80 changes: 80 additions & 0 deletions torchvision/transforms/v2/_auto_augment.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,24 @@ def _apply_image_or_video_transform(


class AutoAugment(_AutoAugmentBase):
r"""[BETA] AutoAugment data augmentation method based on
`"AutoAugment: Learning Augmentation Strategies from Data" <https://arxiv.org/pdf/1805.09501.pdf>`_.

.. betastatus:: AutoAugment transform

If the image is torch Tensor, it should be of type torch.uint8, and it is expected
to have [..., 1 or 3, H, W] shape, where ... means an arbitrary number of leading dimensions.
If img is PIL Image, it is expected to be in mode "L" or "RGB".

Args:
policy (AutoAugmentPolicy): Desired policy enum defined by
:class:`torchvision.transforms.autoaugment.AutoAugmentPolicy`. Default is ``AutoAugmentPolicy.IMAGENET``.
interpolation (InterpolationMode): Desired interpolation enum defined by
:class:`torchvision.transforms.InterpolationMode`. Default is ``InterpolationMode.NEAREST``.
If input is Tensor, only ``InterpolationMode.NEAREST``, ``InterpolationMode.BILINEAR`` are supported.
fill (sequence or number, optional): Pixel fill value for the area outside the transformed
image. If given a number, the value is used for all bands respectively.
"""
_v1_transform_cls = _transforms.AutoAugment

_AUGMENTATION_SPACE = {
Expand Down Expand Up @@ -318,6 +336,27 @@ def forward(self, *inputs: Any) -> Any:


class RandAugment(_AutoAugmentBase):
r"""[BETA] RandAugment data augmentation method based on
`"RandAugment: Practical automated data augmentation with a reduced search space"
<https://arxiv.org/abs/1909.13719>`_.

.. betastatus:: RandAugment transform

If the image is torch Tensor, it should be of type torch.uint8, and it is expected
to have [..., 1 or 3, H, W] shape, where ... means an arbitrary number of leading dimensions.
If img is PIL Image, it is expected to be in mode "L" or "RGB".

Args:
num_ops (int): Number of augmentation transformations to apply sequentially.
magnitude (int): Magnitude for all the transformations.
num_magnitude_bins (int): The number of different magnitude values.
interpolation (InterpolationMode): Desired interpolation enum defined by
:class:`torchvision.transforms.InterpolationMode`. Default is ``InterpolationMode.NEAREST``.
If input is Tensor, only ``InterpolationMode.NEAREST``, ``InterpolationMode.BILINEAR`` are supported.
fill (sequence or number, optional): Pixel fill value for the area outside the transformed
image. If given a number, the value is used for all bands respectively.
"""

_v1_transform_cls = _transforms.RandAugment
_AUGMENTATION_SPACE = {
"Identity": (lambda num_bins, height, width: None, False),
Expand Down Expand Up @@ -379,6 +418,24 @@ def forward(self, *inputs: Any) -> Any:


class TrivialAugmentWide(_AutoAugmentBase):
r"""[BETA] Dataset-independent data-augmentation with TrivialAugment Wide, as described in
`"TrivialAugment: Tuning-free Yet State-of-the-Art Data Augmentation" <https://arxiv.org/abs/2103.10158>`_.

.. betastatus:: TrivialAugmentWide transform

If the image is torch Tensor, it should be of type torch.uint8, and it is expected
to have [..., 1 or 3, H, W] shape, where ... means an arbitrary number of leading dimensions.
If img is PIL Image, it is expected to be in mode "L" or "RGB".

Args:
num_magnitude_bins (int): The number of different magnitude values.
interpolation (InterpolationMode): Desired interpolation enum defined by
:class:`torchvision.transforms.InterpolationMode`. Default is ``InterpolationMode.NEAREST``.
If input is Tensor, only ``InterpolationMode.NEAREST``, ``InterpolationMode.BILINEAR`` are supported.
fill (sequence or number, optional): Pixel fill value for the area outside the transformed
image. If given a number, the value is used for all bands respectively.
"""

_v1_transform_cls = _transforms.TrivialAugmentWide
_AUGMENTATION_SPACE = {
"Identity": (lambda num_bins, height, width: None, False),
Expand Down Expand Up @@ -430,6 +487,29 @@ def forward(self, *inputs: Any) -> Any:


class AugMix(_AutoAugmentBase):
r"""[BETA] AugMix data augmentation method based on
`"AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty" <https://arxiv.org/abs/1912.02781>`_.

.. betastatus:: AugMix transform

If the image is torch Tensor, it should be of type torch.uint8, and it is expected
to have [..., 1 or 3, H, W] shape, where ... means an arbitrary number of leading dimensions.
If img is PIL Image, it is expected to be in mode "L" or "RGB".

Args:
severity (int): The severity of base augmentation operators. Default is ``3``.
mixture_width (int): The number of augmentation chains. Default is ``3``.
chain_depth (int): The depth of augmentation chains. A negative value denotes stochastic depth sampled from the interval [1, 3].
Default is ``-1``.
alpha (float): The hyperparameter for the probability distributions. Default is ``1.0``.
all_ops (bool): Use all operations (including brightness, contrast, color and sharpness). Default is ``True``.
interpolation (InterpolationMode): Desired interpolation enum defined by
:class:`torchvision.transforms.InterpolationMode`. Default is ``InterpolationMode.NEAREST``.
If input is Tensor, only ``InterpolationMode.NEAREST``, ``InterpolationMode.BILINEAR`` are supported.
fill (sequence or number, optional): Pixel fill value for the area outside the transformed
image. If given a number, the value is used for all bands respectively.
"""

_v1_transform_cls = _transforms.AugMix

_PARTIAL_AUGMENTATION_SPACE = {
Expand Down
Loading