Support all integer and floating point dtypes in prototype transform kernels?

The standard rule for dtype support for images and videos is: 

- All floating point and integer tensors are supported.
- Floating point tensors are valid in the range `[0.0, 1.0]` and integer tensors in `[0, torch.iinfo(dtype).max]` (this is currently under review since there were a few cases, where this was not true or simply not handled. See #6825)

However we have currently two kernels that only support `uint8` images or videos:

- https://github.com/pytorch/vision/blob/c84dbfad97251271a789b252a2a1a52c73f623ff/torchvision/prototype/transforms/functional/_color.py#L373-L375
- https://github.com/pytorch/vision/blob/c84dbfad97251271a789b252a2a1a52c73f623ff/torchvision/transforms/functional_tensor.py#L788-L789

This also holds for transforms v1 so this is not a problem of the new API. 

One consequence of that is that AA transforms are only supported for `uint8` images

https://github.com/pytorch/vision/blob/c84dbfad97251271a789b252a2a1a52c73f623ff/torchvision/transforms/autoaugment.py#L104-L107

since both 

https://github.com/pytorch/vision/blob/c84dbfad97251271a789b252a2a1a52c73f623ff/torchvision/transforms/autoaugment.py#L76-L77

and 

https://github.com/pytorch/vision/blob/c84dbfad97251271a789b252a2a1a52c73f623ff/torchvision/transforms/autoaugment.py#L82-L83

are used.

One possible way of mitigating this to simply have a `convert_dtype(image, torch.uint8)` in the beginning and converting back after computation. 

That is probably needed for `equalize` since we recently switched away from the histogram ops of `torch` towards our "custom" implementation to enable batch processing (#6757). However, this relies on the fact that the input is an integer and in its current form even on `uint8` due to some hardcoded constants.

For `posterize` I think it is fairly easy to provide the same functionality for float inputs directly without going through a dtype conversion first.

cc @vfdev-5 @datumbox @bjuncek

	class AutoAugment(torch.nn.Module):
	r"""AutoAugment data augmentation method based on
	`"AutoAugment: Learning Augmentation Strategies from Data" <https://arxiv.org/pdf/1805.09501.pdf>`_.
	If the image is torch Tensor, it should be of type torch.uint8, and it is expected

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Support all integer and floating point dtypes in prototype transform kernels? #6840

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	def equalize_image_tensor(image: torch.Tensor) -> torch.Tensor:
	if image.dtype != torch.uint8:
	raise TypeError(f"Only torch.uint8 image tensors are supported, but found {image.dtype}")

	if img.dtype != torch.uint8:
	raise TypeError(f"Only torch.uint8 image tensors are supported, but found {img.dtype}")

	elif op_name == "Posterize":
	img = F.posterize(img, int(magnitude))

Uh oh!

Support all integer and floating point dtypes in prototype transform kernels? #6840

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions