Skip to content

Conversation

@NielsRogge
Copy link
Contributor

@NielsRogge NielsRogge commented Jun 16, 2022

What does this PR do?

This PR improves the vision models by:

  • removing to_2tuple
  • sanity checking whether the channel dimension of pixel values provided to the model match with config.num_channels
  • replacing hardcoded 3 with config.num_channels for xxxForMaskedImageModeling models (fixes SimMIM output num_channels should not be hardcoded #17727)
  • replacing hardcoded 3 by config.num_channels in Flax models (ViT, BEiT)

To do:

  • ViT
  • BEiT
  • DeiT
  • Swin
  • PoolFormer
  • DPT
  • YOLOS
  • ViLT
  • GLPN
  • DPT
  • Data2VecVision
  • MaskFormer
  • ViTMAE
  • TF and Flax implementations
  • Corresponding test files
  • add more Copied from statements (e.g. DropPath)

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 16, 2022

The documentation is not available anymore as the PR was closed or merged.

This was referenced Jun 22, 2022
@NielsRogge NielsRogge force-pushed the fix_simmim_channels branch from dcd728c to 33b720a Compare June 22, 2022 14:03
@NielsRogge NielsRogge mentioned this pull request Jun 22, 2022
5 tasks
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup! Thanks for working on it!

Copy link
Contributor

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks for making all these changes 🧹🧹🧹

Just some small comments about tests, but otherwise LGTM :)

@NielsRogge NielsRogge merged commit 0917870 into huggingface:main Jun 24, 2022
amyeroberts added a commit to amyeroberts/transformers that referenced this pull request Jun 24, 2022
amyeroberts added a commit that referenced this pull request Jul 13, 2022
* Initial TF DeiT implementation

* Fix copies naming issues

* Fix up + docs

* Properly same main layer

* Name layers properly

* Initial TF DeiT implementation

* Fix copies naming issues

* Fix up + docs

* Properly same main layer

* Name layers properly

* Fixup

* Fix import

* Fix import

* Fix import

* Fix weight loading for tests whilst not on hub

* Add doc tests and remove to_2tuple

* Add back to_2tuple
Removing to_2tuple results in many downstream changes needed because of the copies checks

* Incorporate updates in Improve vision models #17731 PR

* Don't hard code num_channels

* Copy PyTorch DeiT embeddings and remove pytorch operations with mask

* Fix patch embeddings & tidy up

* Update PixelShuffle to move logic into class layer

* Update doc strings - remove PT references

* Use NHWC format in internal layers

* Fix up

* Use linear activation layer

* Remove unused import

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <[email protected]>
Co-authored-by: NielsRogge <[email protected]>

Co-authored-by: NielsRogge <[email protected]>
Co-authored-by: Sylvain Gugger <[email protected]>

* Move dataclass to top of file

* Remove from_pt now weights on hub

* Fixup

Co-authored-by: NielsRogge <[email protected]>
Co-authored-by: Sylvain Gugger <[email protected]>
Co-authored-by: Amy Roberts <[email protected]>
viclzhu pushed a commit to viclzhu/transformers that referenced this pull request Jul 18, 2022
* Initial TF DeiT implementation

* Fix copies naming issues

* Fix up + docs

* Properly same main layer

* Name layers properly

* Initial TF DeiT implementation

* Fix copies naming issues

* Fix up + docs

* Properly same main layer

* Name layers properly

* Fixup

* Fix import

* Fix import

* Fix import

* Fix weight loading for tests whilst not on hub

* Add doc tests and remove to_2tuple

* Add back to_2tuple
Removing to_2tuple results in many downstream changes needed because of the copies checks

* Incorporate updates in Improve vision models huggingface#17731 PR

* Don't hard code num_channels

* Copy PyTorch DeiT embeddings and remove pytorch operations with mask

* Fix patch embeddings & tidy up

* Update PixelShuffle to move logic into class layer

* Update doc strings - remove PT references

* Use NHWC format in internal layers

* Fix up

* Use linear activation layer

* Remove unused import

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <[email protected]>
Co-authored-by: NielsRogge <[email protected]>

Co-authored-by: NielsRogge <[email protected]>
Co-authored-by: Sylvain Gugger <[email protected]>

* Move dataclass to top of file

* Remove from_pt now weights on hub

* Fixup

Co-authored-by: NielsRogge <[email protected]>
Co-authored-by: Sylvain Gugger <[email protected]>
Co-authored-by: Amy Roberts <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SimMIM output num_channels should not be hardcoded

5 participants