Skip to content

Conversation

@pmeier
Copy link
Contributor

@pmeier pmeier commented Sep 7, 2022

Before this PR, padding degenerate segmentation masks failed:

import torch
from torchvision.prototype.transforms import functional as F
from torchvision.prototype import features

mask = features.SegmentationMask(torch.randint(0, 2, (0, 937, 1024), dtype=torch.bool))
F.pad(mask, 5)
RuntimeError: cannot reshape tensor of 0 elements into shape [-1, 0, 937, 1024] because the unspecified dimension size -1 can be any value and is ambiguous

I fixed the kernel and extended the kernel tests for all input types (images, segmentation masks, bounding boxes, ...) to also check the degenerate case.

def pad_segmentation_mask(
segmentation_mask: torch.Tensor, padding: Union[int, List[int]], padding_mode: str = "constant"
) -> torch.Tensor:
num_masks, height, width = segmentation_mask.shape[-3:]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why we replicated pad_image_tensor here if the only difference is that we set a fixed fill=0.

Copy link
Contributor Author

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the ones I fixed, there are more operators that use the .view(-1, ...) idiom.

I've xfailed some tests for them. Plus, there are more bounding box ops that do this. But we don't need to worry about them, because .view(-1, ...) only fails if there is a 0 in the ... part. For bounding boxes we only do

>>> t = torch.empty(0, 4)
>>> t.view(-1, 4)
tensor([], size=(0, 4))

The ambiguity only arises for segmentation masks, which use the channel dimension as number of objects and that can be 0.

>>> t = torch.empty(4, 0, 16, 16)
>>> t.view(-1, 0, 16, 16)
RuntimeError: cannot reshape tensor of 0 elements into shape [-1, 0, 16, 16] because the unspecified dimension size -1 can be any value and is ambiguous

That might be a good reason to have something in the kernel other than just calling the image kernel. But we can discuss this later.

Comment on lines +96 to +97
# transforms.RandomRotation(degrees=(-45, 45)),
# transforms.RandomAffine(degrees=(-45, 45)),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They don't work since the new input data generation also adds degenerate shapes and that is not supported yet by the underlying kernels.

masks = make_segmentation_mask((32, 24))
ohe_masks = features.SegmentationMask(torch.randint(0, 2, size=(6, 32, 24)))
sample = [image, bboxes, label, ohe_label, masks, ohe_masks]
masks = make_segmentation_mask((32, 24), num_objects=6)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently only work with object detection masks. They were called ohe_masks here before. I've renamed them and removed the tests for what was masks before.

size = size or torch.randint(16, 33, (2,)).tolist()
shape = (*extra_dims, 1, *size)
data = make_tensor(shape, low=0, high=num_categories, dtype=dtype)
def make_segmentation_mask(size=None, *, num_objects=None, extra_dims=(), dtype=torch.uint8):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functionality generated masks for segmentation tasks, which we currently don't support.

Comment on lines +920 to +923
incorrect_expected_segmentation_mask_setup = pytest.mark.xfail(
reason="This test fails because the expected result computation is wrong. Fix ASAP.",
strict=False,
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to get the CI green for now. As the text implies we should get to this ASAP. But doing it now would slow down the reference training.

Copy link
Contributor

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK to me

padded_image = _FT.pad(
img=img.view(-1, num_channels, height, width), padding=padding, fill=fill, padding_mode=padding_mode
)
left, right, top, bottom = _FT._parse_pad_padding(padding)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this part of code can be put into else part to avoid parsing twice for normal case

Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'm pretty sure we don't need to clone but let's check this separately.

@pmeier pmeier merged commit 84dcf69 into pytorch:main Sep 7, 2022
@pmeier pmeier deleted the pad-bbox-degenerate branch September 7, 2022 15:49
facebook-github-bot pushed a commit that referenced this pull request Sep 9, 2022
Summary:
* fix padding for degenerate segmentation masks

* extend test data degeneration to degenerate inputs

* add even more degenerate shapes

* simplify kernel

* [SKIP CI] only GHA

* add more degenerate segmentation masks

* fix segmentation mask generation

* xfail some tests

* Revert "simplify kernel"

This reverts commit 18c5e4f.

* fix resize for degenerate inputs

* [SKIP CI] CircleCI

* fix RandomIoUCrop test

* [SKIP CI] CircleCI

* cleanup

* [SKIP CI] CircleCI

* add perf TODO comments

* [SKIP CI] CircleCI

Reviewed By: YosuaMichael

Differential Revision: D39381957

fbshipit-source-id: 05a0da7ead77f33bc7ec7423271693a9aef6ad7e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants