Add `init_weights` method to `FlaxMixin` #513

mishig25 · 2022-09-14T16:50:30Z

Implementation of `init_weights` method is required for any class that is inheriting from FlaxModelMixin.

Here is an example:

class UNet2D(nn.Module, FlaxModelMixin, ConfigMixin):
    def init_weights(self, rng: jax.random.PRNGKey) -> FrozenDict:
        # init input tensors
        sample_shape = (1, self.config.sample_size, self.config.sample_size, self.config.in_channels)
        sample = jnp.zeros(sample_shape, dtype=jnp.float32)
        timestpes = jnp.ones((1,), dtype=jnp.int32)
        encoder_hidden_states = jnp.zeros((1, 1, self.config.cross_attention_dim), dtype=jnp.float32)

        params_rng, dropout_rng = jax.random.split(rng)
        rngs = {"params": params_rng, "dropout": dropout_rng}

        return self.init(rngs, sample, timestpes, encoder_hidden_states)["params"]

Unlike transformers.FlaxPretrainedModel.init_weights, diffusers.FlaxModelMixin.init_weights signature does not have parameter input_shape. Read more here & here on why the decision was made

Users will have 2 option to init weights:

class FlaxModel(nn.Module, FlaxModelMixin, ConfigMixin):
   ...

my_model = FlaxModel()
# option1 (FlaxModelMixin)
params = my_model.init_weights(key)
# option2 (linen.Module)
x = random.normal(key1, (...)) # Dummy input
params = FlaxModel.init(key, x)

# option1 is more convenient since the random input is automatically handled inside `init_weights`, but they will still have an option2 if they want to use it as a normal linen.Module

HuggingFaceDocBuilderDev · 2022-09-14T16:53:55Z

The documentation is not available anymore as the PR was closed or merged.

src/diffusers/modeling_flax_utils.py

patrickvonplaten · 2022-09-14T20:36:07Z

src/diffusers/modeling_flax_utils.py

        ```"""
        return self._cast_floating_to(params, jnp.float16, mask)

+    def init_weights(self, rng: jax.random.PRNGKey) -> Dict:


I like that we don't allow the input_shape to be passed for now since it's much more restricted than Transformers, i.e. we should for now always be able to infer the correct shape from the config. This looks good to me!

patrickvonplaten

Just some minor comments regarding naming, and think we should remove the "allow mismatched keys" functionality for now. But apart from this this is top!

src/diffusers/modeling_flax_utils.py

pcuenca · 2022-09-15T07:31:44Z

I agree with @patrickvonplaten's comments. This is very cool, I'll test it later!

mishig25 · 2022-09-15T08:23:01Z

should be ready for review, also updated the description with more details for clairfication

pcuenca

Looks great!

mishig25 · 2022-09-15T11:09:16Z

@patrickvonplaten @patil-suraj should I merge?

patil-suraj

Very cool! Just left couple of nits

patil-suraj · 2022-09-15T14:37:10Z

src/diffusers/modeling_flax_utils.py

        ```"""
        return self._cast_floating_to(params, jnp.float16, mask)

+    def init_weights(self, rng: jax.random.PRNGKey) -> Dict:


src/diffusers/modeling_flax_utils.py

patil-suraj · 2022-09-15T14:39:52Z

src/diffusers/modeling_flax_utils.py

+                f"Some weights of the model checkpoint at {pretrained_model_name_or_path} were not used when"
+                f" initializing {model.__class__.__name__}: {unexpected_keys}\n- This IS expected if you are"
+                f" initializing {model.__class__.__name__} from the checkpoint of a model trained on another task or"
+                " with another architecture (e.g. initializing a BertForSequenceClassification model from a"
+                " BertForPreTraining model).\n- This IS NOT expected if you are initializing"
+                f" {model.__class__.__name__} from the checkpoint of a model that you expect to be exactly identical"
+                " (initializing a BertForSequenceClassification model from a BertForSequenceClassification model)."


docstring should be updated for diffusers

wdyt 869014c since we don't have LLM-like heads in diffusion models ?

Co-authored-by: Suraj Patil <[email protected]>

patrickvonplaten · 2022-09-15T15:01:37Z

Merging as tests are taking too long at the moment

@mishig25

* First UNet Flax modeling blocks. Mimic the structure of the PyTorch files. The model classes themselves need work, depending on what we do about configuration and initialization. * Remove FlaxUNet2DConfig class. * ignore_for_config non-config args. * Implement `FlaxModelMixin` * Use new mixins for Flax UNet. For some reason the configuration is not correctly applied; the signature of the `__init__` method does not contain all the parameters by the time it's inspected in `extract_init_dict`. * Import `FlaxUNet2DConditionModel` if flax is available. * Rm unused method `framework` * Update src/diffusers/modeling_flax_utils.py Co-authored-by: Suraj Patil <[email protected]> * Indicate types in flax.struct.dataclass as pointed out by @mishig25 Co-authored-by: Mishig Davaadorj <[email protected]> * Fix typo in transformer block. * make style * some more changes * make style * Add comment * Update src/diffusers/modeling_flax_utils.py Co-authored-by: Patrick von Platen <[email protected]> * Rm unneeded comment * Update docstrings * correct ignore kwargs * make style * Update docstring examples * Make style * Style: remove empty line. * Apply style (after upgrading black from pinned version) * Remove some commented code and unused imports. * Add init_weights (not yet in use until #513). * Trickle down deterministic to blocks. * Rename q, k, v according to the latest PyTorch version. Note that weights were exported with the old names, so we need to be careful. * Flax UNet docstrings, default props as in PyTorch. * Fix minor typos in PyTorch docstrings. * Use FlaxUNet2DConditionOutput as output from UNet. * make style Co-authored-by: Mishig Davaadorj <[email protected]> Co-authored-by: Mishig Davaadorj <[email protected]> Co-authored-by: Suraj Patil <[email protected]> Co-authored-by: Patrick von Platen <[email protected]>

* Add `init_weights` method to `FlaxMixin` * Rn `random_state` -> `shape_state` * `PRNGKey(0)` for `jax.eval_shape` * No allow mismatched sizes * Update src/diffusers/modeling_flax_utils.py Co-authored-by: Suraj Patil <[email protected]> * Update src/diffusers/modeling_flax_utils.py Co-authored-by: Suraj Patil <[email protected]> * docstring diffusers Co-authored-by: Suraj Patil <[email protected]>

@mishig25

* First UNet Flax modeling blocks. Mimic the structure of the PyTorch files. The model classes themselves need work, depending on what we do about configuration and initialization. * Remove FlaxUNet2DConfig class. * ignore_for_config non-config args. * Implement `FlaxModelMixin` * Use new mixins for Flax UNet. For some reason the configuration is not correctly applied; the signature of the `__init__` method does not contain all the parameters by the time it's inspected in `extract_init_dict`. * Import `FlaxUNet2DConditionModel` if flax is available. * Rm unused method `framework` * Update src/diffusers/modeling_flax_utils.py Co-authored-by: Suraj Patil <[email protected]> * Indicate types in flax.struct.dataclass as pointed out by @mishig25 Co-authored-by: Mishig Davaadorj <[email protected]> * Fix typo in transformer block. * make style * some more changes * make style * Add comment * Update src/diffusers/modeling_flax_utils.py Co-authored-by: Patrick von Platen <[email protected]> * Rm unneeded comment * Update docstrings * correct ignore kwargs * make style * Update docstring examples * Make style * Style: remove empty line. * Apply style (after upgrading black from pinned version) * Remove some commented code and unused imports. * Add init_weights (not yet in use until huggingface#513). * Trickle down deterministic to blocks. * Rename q, k, v according to the latest PyTorch version. Note that weights were exported with the old names, so we need to be careful. * Flax UNet docstrings, default props as in PyTorch. * Fix minor typos in PyTorch docstrings. * Use FlaxUNet2DConditionOutput as output from UNet. * make style Co-authored-by: Mishig Davaadorj <[email protected]> Co-authored-by: Mishig Davaadorj <[email protected]> Co-authored-by: Suraj Patil <[email protected]> Co-authored-by: Patrick von Platen <[email protected]>

Add init_weights method to FlaxMixin

32c2be5

mishig25 marked this pull request as draft September 14, 2022 17:05

patrickvonplaten reviewed Sep 14, 2022

View reviewed changes

src/diffusers/modeling_flax_utils.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Sep 14, 2022

View reviewed changes

src/diffusers/modeling_flax_utils.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Sep 14, 2022

View reviewed changes

src/diffusers/modeling_flax_utils.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Sep 14, 2022

View reviewed changes

patrickvonplaten requested review from patil-suraj and pcuenca September 14, 2022 20:36

pcuenca reviewed Sep 15, 2022

View reviewed changes

src/diffusers/modeling_flax_utils.py Outdated Show resolved Hide resolved

mishig25 added 3 commits September 15, 2022 08:02

Rn random_state -> shape_state

305a544

PRNGKey(0) for jax.eval_shape

803da8f

No allow mismatched sizes

73e0bc6

mishig25 marked this pull request as ready for review September 15, 2022 08:22

mishig25 requested review from patrickvonplaten and pcuenca September 15, 2022 08:22

pcuenca added a commit that referenced this pull request Sep 15, 2022

Add init_weights (not yet in use until #513).

da6ddfd

pcuenca approved these changes Sep 15, 2022

View reviewed changes

patrickvonplaten approved these changes Sep 15, 2022

View reviewed changes

patil-suraj approved these changes Sep 15, 2022

View reviewed changes

Mishig Davaadorj and others added 3 commits September 15, 2022 16:48

Update src/diffusers/modeling_flax_utils.py

f9675e9

Co-authored-by: Suraj Patil <[email protected]>

Update src/diffusers/modeling_flax_utils.py

1ed00f3

Co-authored-by: Suraj Patil <[email protected]>

docstring diffusers

869014c

patrickvonplaten merged commit fb5468a into main Sep 15, 2022

patrickvonplaten deleted the flax_init_weights branch September 15, 2022 15:01

Add init_weights method to FlaxMixin #513

Add init_weights method to FlaxMixin #513

Uh oh!

Conversation

mishig25 commented Sep 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation of init_weights method is required for any class that is inheriting from FlaxModelMixin.

Uh oh!

HuggingFaceDocBuilderDev commented Sep 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

patrickvonplaten Sep 14, 2022

Choose a reason for hiding this comment

Uh oh!

patil-suraj Sep 15, 2022

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pcuenca commented Sep 15, 2022

Uh oh!

mishig25 commented Sep 15, 2022

Uh oh!

pcuenca left a comment

Choose a reason for hiding this comment

Uh oh!

mishig25 commented Sep 15, 2022

Uh oh!

patil-suraj left a comment

Choose a reason for hiding this comment

Uh oh!

patil-suraj Sep 15, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

patil-suraj Sep 15, 2022

Choose a reason for hiding this comment

Uh oh!

mishig25 Sep 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten commented Sep 15, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Add `init_weights` method to `FlaxMixin` #513

Add `init_weights` method to `FlaxMixin` #513

mishig25 commented Sep 14, 2022 •

edited

Loading

Implementation of `init_weights` method is required for any class that is inheriting from FlaxModelMixin.

HuggingFaceDocBuilderDev commented Sep 14, 2022 •

edited

Loading

mishig25 Sep 15, 2022 •

edited

Loading