Added script for updating old checkpoints and configs. #397

BlueCrescent · 2025-08-25T11:53:37Z

What does this PR do?

This PR adds a script for updating old checkpoints and configs.

General Changes

Added the script.

Checklist before submitting final PR

My PR is minimal and addresses one issue in isolation
I have merged the latest version of the target branch into this feature branch
I have reviewed my own code w.r.t. correct implementation, missing type hints, proper documentation, etc.
I have run a sample config for model training
I have checked that all tests run through (python tests/tests.py)
I have updated the internal changelog (CHANGELOG_DEV.md)

Copilot

Pull Request Overview

This PR adds a utility script for migrating old model checkpoints and configuration files to a new format. The script handles both configuration file updates and checkpoint state dictionary transformations to maintain compatibility with updated model structures.

Key Changes

Added comprehensive checkpoint and config migration script with YAML processing
Implemented state dictionary updates for model weight key transformations
Added validation functionality to test updated configurations

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

scripts/update_old_checkpoints.py

…ariables. Co-authored-by: Copilot <[email protected]>

le1nux

I'm a bit hesitating if the automated config updates are the way to go, or if we should provide documentation e.g., a diff for model_raw, explaining how to update the models.
The reason is that we are updating now based on existing component names, e.g., checkpointed_model. However, the configs themselves never enforce certain component names, which is why if the user renames checkpointed_model to something like my_checkpointed_model, then the conversion script already fails.
Also, we are deleting some components, which are still used if you create the diff between these two configs:

https://github.com/Modalities/modalities/blob/83c87b9d6d6fbbb228bab31dccf1870b12679775/config_files/training/config_lorem_ipsum_long_fsdp1.yaml

https://github.com/Modalities/modalities/blob/83c87b9d6d6fbbb228bab31dccf1870b12679775/config_files/training/config_lorem_ipsum_long_fsdp2.yaml

Nevertheless, I think that the automated checkpoint update is still useful and I would place it in a backward_compatibility/ module.

le1nux · 2025-08-28T15:51:28Z

scripts/update_old_checkpoints.py

+    old_model_config = sys.argv[1]
+    new_model_config = sys.argv[2]


these are paths not "configs"

I would use pathlib.Path

le1nux · 2025-08-28T15:56:03Z

scripts/update_old_checkpoints.py

+config_type = dict[str, "str | config_type"]
+
+
+def update_model(old_model_config: str, new_model_config: str, new_checkpoint_path: str | None):


arguments are all paths.

le1nux · 2025-08-28T16:00:32Z

scripts/update_old_checkpoints.py

+
+def add_new_keys(config: config_type):
+    model_config = config["model_raw" if "model_raw" in config else "model"]["config"]
+    model_config["use_weight_tying"] = False


weight tying we also had before. Why are we hardcoding this to False now?

le1nux · 2025-08-28T16:11:49Z

scripts/update_old_checkpoints.py

+    if "evaluation_subscriber" in config and "experiment_id" in config["evaluation_subscriber"]["config"]:
+        del config["evaluation_subscriber"]["config"]["experiment_id"]
+    if "settings" in config and "experiment_id" in config["settings"]:
+        del config["settings"]["experiment_id"]
+    if (
+        "checkpoint_saving" in config
+        and "checkpoint_saving_execution" in config["checkpoint_saving"]["config"]
+        and "experiment_id" in config["checkpoint_saving"]["config"]["checkpoint_saving_execution"]["config"]
+    ):
+        del config["checkpoint_saving"]["config"]["checkpoint_saving_execution"]["config"]["experiment_id"]


Why are we deleting this?

Also in the current FSDP2 config e.g., https://github.com/Modalities/modalities/blob/83c87b9d6d6fbbb228bab31dccf1870b12679775/config_files/training/config_lorem_ipsum_long_fsdp2.yaml we still have all of this.

le1nux · 2025-08-28T16:15:53Z

scripts/update_old_checkpoints.py

+
+
+def rename_keys(config: config_type):
+    model_config = config["model_raw" if "model_raw" in config else "model"]["config"]


we could have the convention that general model must be always named model_raw.
We are already enforcing it here:

modalities/src/modalities/config/instantiation_models.py

Line 181 in 83c87b9

model_raw: PydanticPytorchModuleType

le1nux · 2025-08-28T16:26:19Z

scripts/update_old_checkpoints.py

+    new_model_config = sys.argv[2]
+    new_checkpoint_path = sys.argv[3] if len(sys.argv) > 3 else None
+
+    update_model(old_model_config, new_model_config, new_checkpoint_path)


I would make updating checkpoint and updating the config two separate functions that get called sequentially here.

le1nux · 2025-08-28T16:39:32Z

scripts/update_old_checkpoints.py

+    old_norm_keys = ["attention_norm", "ffn_norm", "lm_head_norm"]
+    new_norm_keys = ["attention_norm_config", "ffn_norm_config", "lm_head_norm_config"]
+    for old_key, new_key in zip(old_norm_keys, new_norm_keys):
+        rename_config_key(model_config, old_key, new_key)
+        rename_config_key(model_config[new_key], "variant_key", "norm_type")


We should delete component_key, no?

le1nux · 2025-08-28T16:43:33Z

scripts/update_old_checkpoints.py

+    if new_checkpoint_path is not None:
+        if "checkpointed_model" in config:
+            old_path = config["checkpointed_model"]["config"]["checkpoint_path"]
+            config["checkpointed_model"]["config"]["checkpoint_path"] = new_checkpoint_path


I checked all configs, where did you see checkpointed_model?

le1nux · 2025-08-28T16:54:50Z

scripts/update_old_checkpoints.py

+    """
+    state_dict = torch.load(old_model_path)
+    if "lm_head.weight" in state_dict:
+        state_dict["transformer.lm_head.weight"] = state_dict["lm_head.weight"]


How would this behave, if we used weight tying?
Do we store them twice (i.e., embeddings and lm_head) and then internally replace the lm_head with a reference to the embeddings weights?

feat(scripts): Added script for updating old checkpoints and configs.

970df82

BlueCrescent requested a review from Copilot August 25, 2025 11:53

Copilot AI reviewed Aug 25, 2025

View reviewed changes

scripts/update_old_checkpoints.py Show resolved Hide resolved

scripts/update_old_checkpoints.py Outdated Show resolved Hide resolved

BlueCrescent and others added 2 commits August 25, 2025 14:01

fix(scripts): Use context manager to temporarily change environment v…

2f27b4b

…ariables. Co-authored-by: Copilot <[email protected]>

fix(scripts): Addressed PR comments.

9aecce3

le1nux requested changes Aug 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added script for updating old checkpoints and configs. #397

Added script for updating old checkpoints and configs. #397

Uh oh!

BlueCrescent commented Aug 25, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

le1nux left a comment

Uh oh!

le1nux Aug 28, 2025

Uh oh!

le1nux Aug 28, 2025

Uh oh!

le1nux Aug 28, 2025

Uh oh!

le1nux Aug 28, 2025

Uh oh!

le1nux Aug 28, 2025

Uh oh!

le1nux Aug 28, 2025

Uh oh!

le1nux Aug 28, 2025

Uh oh!

le1nux Aug 28, 2025

Uh oh!

le1nux Aug 28, 2025

Uh oh!

le1nux Aug 28, 2025

Uh oh!

le1nux Aug 28, 2025

Uh oh!

Uh oh!

		old_model_config = sys.argv[1]
		new_model_config = sys.argv[2]

		config_type = dict[str, "str \| config_type"]


		def update_model(old_model_config: str, new_model_config: str, new_checkpoint_path: str \| None):



		def rename_keys(config: config_type):
		model_config = config["model_raw" if "model_raw" in config else "model"]["config"]

Added script for updating old checkpoints and configs. #397

Are you sure you want to change the base?

Added script for updating old checkpoints and configs. #397

Uh oh!

Conversation

BlueCrescent commented Aug 25, 2025

What does this PR do?

General Changes

Checklist before submitting final PR

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes

Uh oh!

Uh oh!

Uh oh!

le1nux left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!