Skip to content

Conversation

@avtc
Copy link
Contributor

@avtc avtc commented Oct 25, 2025

@Qubitium please review the fix for #2116

@avtc
Copy link
Contributor Author

avtc commented Oct 26, 2025

  1. I have quantized with this fix, but after the model save there is an error in the end:
Finished! Quantized model saved to /home/ubuntu/models/GPTQModel/GLM-4.5-Air-gptqmodel-w8g64-tp8-padmoe1-v2-0-dump0.05-bs1-sException ignored in: <function ProgressBar.__del__ at 0x25c03d93fc0>░░░░░░░░░░░░░░░░░░░░░░░░░| 5:09:16 / 0:00:00 [0/0] 0.0%
Traceback (most recent call last):░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░| 5:09:16 / 0:00:00 [0/0] 0.0%
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/logbar/progress.py", line 876, in __del__
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/logbar/progress.py", line 916, in close
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/logbar/progress.py", line 594, in detach
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/logbar/progress.py", line 495, in _render_lock_context
TypeError: 'NoneType' object is not callable
  1. Will check if balanced can be improved for non-quantized expert modules. Sometimes whole layers are excluded from quantization in dynamic config.

@Qubitium
Copy link
Collaborator

Finished! Quantized model saved to /home/ubuntu/models/GPTQModel/GLM-4.5-Air-gptqmodel-w8g64-tp8-padmoe1-v2-0-dump0.05-bs1-sException ignored in: <function ProgressBar.del at 0x25c03d93fc0>░░░░░░░░░░░░░░░░░░░░░░░░░| 5:09:16 / 0:00:00 [0/0] 0.0%
Traceback (most recent call last):░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░| 5:09:16 / 0:00:00 [0/0] 0.0%
File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/logbar/progress.py", line 876, in del
File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/logbar/progress.py", line 916, in close
File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/logbar/progress.py", line 594, in detach
File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/logbar/progress.py", line 495, in _render_lock_context
TypeError: 'NoneType' object is not callable

The error is a state bug in logbar so I need to fix this and unrelated to gpt-qmodel. I will check and merge this PR soon!

@Qubitium
Copy link
Collaborator

Qubitium commented Oct 26, 2025

@avtc There is an issue where is_moe and subset_forward_serial check is now outside of the subset loop in the full layer so that even if the layer containsmoe but the first few subset is not, we are forcing the slower subset_forward_serial execution?

For example, if we have a normal moe layer:

  1. attn: attention layer (normal) qkvo
  2. attn: shared attention (many moe has shared attention) shred qkvo
  3. mlp: expert modules (this is the part we want vram.balanced and subset_forward_serial)

So layers subset (1) and optionally subset (2) are forced to execute serially when they can still execute under faster data-paralleized.

Can you do a debug and print out the subsets and also the modules list when you now do the is_moe check?

So with dynamic we have an issue where:

  1. subset can be empty
  2. entire layer can be empty, or in another words, all subsets in that layer is now empety

@Qubitium
Copy link
Collaborator

The curent module_looper is a headache to unwrap. I am going to make it into a few modules instead of the monolithic blob with all these for loops that is making my head swirl trying to read/debug it.

@avtc
Copy link
Contributor Author

avtc commented Oct 26, 2025

I will partially revert and test. Do not see how convert to draft, please assume this is not finalized

@avtc
Copy link
Contributor Author

avtc commented Oct 26, 2025

I have removed the extra logic from before the loop, and only fixed the issue with undefined variable.
It surpasses non-moe layer[0] with all modules excluded.
So should be fine.
There can be still insufficient VRAM when expert layer will be excluded by dynamic rules, as balance is not happening for them.

P.S. Sometimes I have encountered the error even with 4 GPUs:

Traceback (most recent call last):----------+---------+
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/utils/threadx.py", line 484, in _run2/46] 4.3%
    result = fn(*args, **kwargs)░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░| 0:04:19 / 0:00:00 [0/0] 0.0%
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/looper/module_looper.py", line 1613, in _process_on_worker
    proc.process(module=nm)
    ~~~~~~~~~~~~^^^^^^^^^^^
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/looper/gptq_processor.py", line 162, in process
    wq, q_scales, q_zeros, q_g_idx, duration, avg_loss, damp_percent, nsamples = g.quantize()
                                                                                 ~~~~~~~~~~^^
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/quantization/gptq.py", line 602, in quantize
    self.finalize_hessian(target_device=target_device)
    ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/quantization/gptq.py", line 520, in finalize_hessian
    self._materialize_global_hessian(target_device=target_device)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/venvs/gptqmodelt/lib/python3.13t/site-packages/gptqmodel/quantization/gptq.py", line 503, in _materialize_global_hessian
    tmp = partial.to(device=result_accum.device, dtype=torch.float32)
torch.AcceleratorError: CUDA error: invalid argument
Search for `cudaErrorInvalidValue' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.

Maybe it is related with current fix...

@avtc
Copy link
Contributor Author

avtc commented Oct 26, 2025

This PR can be considered as final

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants