Skip to content

When all modules of a layer excluded there is an error in forward replay #2116

@avtc

Description

@avtc

@Qubitium

https://github.com/ModelCloud/GPTQModel/blob/e8a9fabc7e645a3b4d43d194fd3894cbc665dc5f/gptqmodel/looper/module_looper.py#L1324C1-L1332C33

                for index, names in enumerate(modules):
                    subset = self.crate_named_modules(full=full, is_lm_head_module=is_lm_head_module,
                                                      layer_index=layer_index, layers_prefix=layers_prefix,
                                                      names=names,
                                                      processor=processor,
                                                      fail_safe=fail_safe)

                    if len(subset) == 0:
                        continue

The variables: subset_total, forward_device_map, subset_forward_serial, preserve_devices are initialized inside of this loop (but after the continue), but after loop finished they are referenced in forward_replay.
And when subset len is 0 - they are not initialized.
Subset len is 0 when dynamic config exclude all modules of a layer.
For example for GLM-4.5-Air, dynamic config leave only moe modules to quantize:

dynamic = {
    r"-:model.embed_tokens.weight": {},
    r"-:.*shared_experts": {},
    r"-:.*shared_head": {},
    r"-:lm_head.weight": {},
    r"-:.*mlp.down": {},
    r"-:.*mlp.gate": {},
    r"-:.*mlp.up": {},
    r"-:.*post_attention_layernorm": {},
    r"-:.*self_attn": {},
    r"-:.*norm.weight": {},
    r"-:.*enorm": {},
    r"-:.*hnorm": {},
    r"-:.*eh_proj": {},
    r"-:.*input_layernorm": {},
    }

The layer with index 0 does not have moe modules, so all modules excluded from quantization, so the subset len is 0.
And as variables are not defined the forward_replay fail with:

  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/models/base.py", line 1016, in quantize
    result = module_looper.loop(
        backend=backend,
        fail_safe=self.quantize_config.fail_safe,
    )
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/looper/module_looper.py", line 1156, in loop
    return self._loop_impl(fail_safe=fail_safe, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/looper/module_looper.py", line 1665, in _loop_impl
    replay_source = f"{layer_descriptor}:subset{index + 1}/{subset_total}" if subset_total is not None else f"{layer_descriptor}:subset{index + 1}/?"
                                                                              ^^^^^^^^^^^^
Exception ignored in: <function ProgressBar.__del__ at 0x5e93d8844c0>it is not associated with a value
Traceback (most recent call last): ██░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░| 0:00:01 / 0:00:46 [1/46] 2.2%
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/logbar/progress.py", line 876, in __del__░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░| 0:00:00 / 0:00:00 [0/512] 0.0%
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/logbar/progress.py", line 916, in close
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/logbar/progress.py", line 594, in detach
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/logbar/progress.py", line 495, in _render_lock_context
TypeError: 'NoneType' object is not callable
Exception ignored in: <function ProgressBar.__del__ at 0x5e93d8844c0>
Traceback (most recent call last):
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/logbar/progress.py", line 876, in __del__
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/logbar/progress.py", line 916, in close
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/logbar/progress.py", line 594, in detach
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/logbar/progress.py", line 495, in _render_lock_context
TypeError: 'NoneType' object is not callable

I do not understand how to fix that properly for all variables.
I have tried to skip forward_replay when len(subset) == 0, but encountered another stacktrace, and it seems forward pass is not executed because of continue triggered, and also forward_device_map is populated only from quantized moe modules, shouldn't it be populated from all moe modules of a layer (even excluded ones from quantization with dynamic)? (maybe I am wrong):

  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/looper/module_looper.py", line 1156, in loop
    return self._loop_impl(fail_safe=fail_safe, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/looper/module_looper.py", line 1501, in _loop_impl
    forward_outputs = self._run_forward_batches(
        module=module,
    ...<17 lines>...
        preserve_module_devices=preserve_devices,
    )
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/looper/module_looper.py", line 526, in _run_forward_batches
    return self._run_forward_batches_single(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        module=module,
        ^^^^^^^^^^^^^^
    ...<16 lines>...
        preserve_module_devices=preserve_module_devices,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/looper/module_looper.py", line 640, in _run_forward_batches_single
    layer_input = [move_to(inp, device=exec_device) for inp in layer_inputs[batch_idx]]
                                                               ~~~~~~~~~~~~^^^^^^^^^^^
IndexError: list index out of range
Exception ignored in: <function ProgressBar.__del__ at 0x363b9d94a60>
Traceback (most recent call last):
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/logbar/progress.py", line 876, in __del__
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/logbar/progress.py", line 916, in close
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/logbar/progress.py", line 594, in detach
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/logbar/progress.py", line 495, in _render_lock_context
TypeError: 'NoneType' object is not callable

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions