Improve vision models #17731

NielsRogge · 2022-06-16T12:16:03Z

What does this PR do?

This PR improves the vision models by:

removing to_2tuple
sanity checking whether the channel dimension of pixel values provided to the model match with config.num_channels
replacing hardcoded 3 with config.num_channels for xxxForMaskedImageModeling models (fixes SimMIM output num_channels should not be hardcoded #17727)
replacing hardcoded 3 by config.num_channels in Flax models (ViT, BEiT)

To do:

HuggingFaceDocBuilderDev · 2022-06-16T12:25:16Z

The documentation is not available anymore as the PR was closed or merged.

src/transformers/models/swin/modeling_tf_swin.py

sgugger

Nice cleanup! Thanks for working on it!

src/transformers/models/beit/modeling_beit.py

src/transformers/models/data2vec/modeling_data2vec_vision.py

src/transformers/models/data2vec/modeling_tf_data2vec_vision.py

src/transformers/models/deit/modeling_deit.py

src/transformers/models/dpt/modeling_dpt.py

src/transformers/models/vit/modeling_vit.py

src/transformers/models/vit_mae/modeling_tf_vit_mae.py

src/transformers/models/vit_mae/modeling_vit_mae.py

src/transformers/models/yolos/modeling_yolos.py

amyeroberts

Nice! Thanks for making all these changes 🧹🧹🧹

Just some small comments about tests, but otherwise LGTM :)

src/transformers/models/cvt/modeling_cvt.py

tests/models/swin/test_modeling_swin.py

tests/models/deit/test_modeling_deit.py

tests/models/yolos/test_modeling_yolos.py

* Initial TF DeiT implementation * Fix copies naming issues * Fix up + docs * Properly same main layer * Name layers properly * Initial TF DeiT implementation * Fix copies naming issues * Fix up + docs * Properly same main layer * Name layers properly * Fixup * Fix import * Fix import * Fix import * Fix weight loading for tests whilst not on hub * Add doc tests and remove to_2tuple * Add back to_2tuple Removing to_2tuple results in many downstream changes needed because of the copies checks * Incorporate updates in Improve vision models #17731 PR * Don't hard code num_channels * Copy PyTorch DeiT embeddings and remove pytorch operations with mask * Fix patch embeddings & tidy up * Update PixelShuffle to move logic into class layer * Update doc strings - remove PT references * Use NHWC format in internal layers * Fix up * Use linear activation layer * Remove unused import * Apply suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: NielsRogge <[email protected]> Co-authored-by: NielsRogge <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> * Move dataclass to top of file * Remove from_pt now weights on hub * Fixup Co-authored-by: NielsRogge <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Amy Roberts <[email protected]>

* Initial TF DeiT implementation * Fix copies naming issues * Fix up + docs * Properly same main layer * Name layers properly * Initial TF DeiT implementation * Fix copies naming issues * Fix up + docs * Properly same main layer * Name layers properly * Fixup * Fix import * Fix import * Fix import * Fix weight loading for tests whilst not on hub * Add doc tests and remove to_2tuple * Add back to_2tuple Removing to_2tuple results in many downstream changes needed because of the copies checks * Incorporate updates in Improve vision models huggingface#17731 PR * Don't hard code num_channels * Copy PyTorch DeiT embeddings and remove pytorch operations with mask * Fix patch embeddings & tidy up * Update PixelShuffle to move logic into class layer * Update doc strings - remove PT references * Use NHWC format in internal layers * Fix up * Use linear activation layer * Remove unused import * Apply suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: NielsRogge <[email protected]> Co-authored-by: NielsRogge <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> * Move dataclass to top of file * Remove from_pt now weights on hub * Fixup Co-authored-by: NielsRogge <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Amy Roberts <[email protected]>

This was referenced Jun 22, 2022

Add TF DeiT implementation #17806

Merged

Add VideoMAE #17821

Merged

Add DPT Flax #17779

Closed

Niels Rogge and others added 7 commits June 22, 2022 14:03

Improve vision models

d89dc40

Add a lot of improvements

52d76e0

Remove to_2tuple from swin tests

8c1f418

Fix TF Swin

1c6aa77

Fix more tests

fef92c6

Fix copies

6993ecd

Improve more models

33b720a

NielsRogge force-pushed the fix_simmim_channels branch from dcd728c to 33b720a Compare June 22, 2022 14:03

Fix ViTMAE test

9653777

NielsRogge mentioned this pull request Jun 22, 2022

Add TF ResNet model #17427

Merged

5 tasks

Add channel check for TF models

57ff820

NielsRogge mentioned this pull request Jun 22, 2022

TF implementation of RegNets #17554

Merged

NielsRogge commented Jun 22, 2022

View reviewed changes

src/transformers/models/swin/modeling_tf_swin.py Outdated Show resolved Hide resolved

NielsRogge requested review from amyeroberts and sgugger June 22, 2022 15:43

sgugger approved these changes Jun 22, 2022

View reviewed changes

amyeroberts approved these changes Jun 22, 2022

View reviewed changes

src/transformers/models/cvt/modeling_cvt.py Show resolved Hide resolved

tests/models/swin/test_modeling_swin.py Show resolved Hide resolved

tests/models/deit/test_modeling_deit.py Show resolved Hide resolved

tests/models/yolos/test_modeling_yolos.py Outdated Show resolved Hide resolved

NielsRogge added 7 commits June 23, 2022 09:57

Add proper channel check for TF models

e10b6ca

Apply suggestion from code review

4b947b3

Apply suggestions from code review

1d28d94

Add channel check for Flax models, apply suggestion

84ce118

Fix bug

aee248b

Add tests for greyscale images

465ab11

Add test for interpolation of pos encodigns

08efa9d

NielsRogge merged commit 0917870 into huggingface:main Jun 24, 2022

amyeroberts added a commit to amyeroberts/transformers that referenced this pull request Jun 24, 2022

Incorporate updates in Improve vision models huggingface#17731 PR

79c85dc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve vision models #17731

Improve vision models #17731

Uh oh!

NielsRogge commented Jun 16, 2022 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Jun 16, 2022 •

edited

Loading

Uh oh!

Uh oh!

sgugger left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amyeroberts left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Improve vision models #17731

Improve vision models #17731

Uh oh!

Conversation

NielsRogge commented Jun 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Jun 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

NielsRogge commented Jun 16, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Jun 16, 2022 •

edited

Loading