Add T5Gemma support #14940 #15123

baonudesifeizhai · 2025-08-06T13:52:58Z

Implement encoder-decoder architecture with rela tive attention bias, tensor mapping, and model conversion
only tested t5gemma-s-s-prefixlm-it due to memory limitations. Please correct us if there are any mistakes.
Make sure to read the contributing guidelines before submitting a PR

compilade · 2025-08-06T14:14:42Z

convert_hf_to_gguf.py

+        # Don't call super().__init__() because it tries to find standard layer count parameters
+        # that don't exist in T5Gemma models (they have encoder.num_hidden_layers instead)


If this is the only reason, maybe instead call super().__init__() with a modified hparams?

hparams = kwargs.get("hparams") or ModelBase.load_hparams(args[0] if args else kwargs["dir_model"]) encoder_config = hparams.get("encoder", {}) hparams["num_hidden_layers"] = encoder_config.get["num_hidden_layers"] kwargs["hparams"] = hparams super().__init__(*args, **kwargs)

already changed all commit , this is my first time to convert model in to gguf, i just convert t5gemma-s-s-prefixlm-it ... not sure that will work on all t5gemma

compilade · 2025-08-06T14:31:26Z

convert_hf_to_gguf.py

+        for i in range(self.block_count):
+            # Encoder relative attention bias - shape should be (n_rel_attn_bkts, n_head)
+            rel_bias_enc = torch.zeros(n_rel_attn_bkts, n_head_enc, dtype=torch.float16)
+            yield f"enc.blk.{i}.attn_rel_b.weight", rel_bias_enc


Use self.format_tensor_name if possible.

Suggested change

yield f"enc.blk.{i}.attn_rel_b.weight", rel_bias_enc

yield self.format_tensor_name(gguf.MODEL_TENSOR.ENC_ATTN_REL_B, i), rel_bias_enc

(I did not test this, but should probably work)

This also applies to the other places in this function where the output tensor names are hardcoded.

compilade · 2025-08-06T14:36:17Z

convert_hf_to_gguf.py

+        # Dynamically set encoder's other parameters
+        for key, value in encoder_config.items():
+            if key not in ["max_position_embeddings", "hidden_size", "num_hidden_layers", "intermediate_size", 
+                          "num_attention_heads", "num_key_value_heads", "head_dim", "rms_norm_eps", 
+                          "sliding_window", "attn_logit_softcapping", "final_logit_softcapping", 
+                          "rope_theta", "attention_bias", "attention_dropout", "query_pre_attn_scalar", "vocab_size"]:


Instead of excluding keys, maybe enumerating the included keys could be more robust.

It would also avoid adding unexpected metadata which won't necessarily be used.

…ents (ggml-org#14940) - Add T5Gemma model support with proper encoder-decoder architecture - Use super().__init__() instead of manual initialization for better inheritance - Use format_tensor_name() for consistent tensor naming - Explicitly enumerate included keys instead of excluding keys - Add proper type annotations for better type safety - Fix all trailing whitespace issues - Support relative attention bias tensors generation - Handle T5Gemma-specific post-layer normalization tensors - Implement proper tokenizer handling for BPE tokenizer - Add comprehensive tensor mapping for all T5Gemma components

arch : add T5Gemma encoder-decoder architecture support (ggml-org#14940)

93b3228

github-actions bot added the python python script changes label Aug 6, 2025

fix: add type safety checks for T5Gemma model initialization

f5144c1

compilade reviewed Aug 6, 2025

View reviewed changes

baonudesifeizhai added 2 commits August 6, 2025 18:42

fix: resolve type annotation issues in T5Gemma model initialization

062de3a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add T5Gemma support #14940 #15123

Add T5Gemma support #14940 #15123

Uh oh!

baonudesifeizhai commented Aug 6, 2025

Uh oh!

compilade Aug 6, 2025 •

edited

Loading

Uh oh!

baonudesifeizhai Aug 6, 2025

Uh oh!

compilade Aug 6, 2025

Uh oh!

compilade Aug 6, 2025

Uh oh!

Uh oh!

		# Don't call super().__init__() because it tries to find standard layer count parameters
		# that don't exist in T5Gemma models (they have encoder.num_hidden_layers instead)

	yield f"enc.blk.{i}.attn_rel_b.weight", rel_bias_enc
	yield self.format_tensor_name(gguf.MODEL_TENSOR.ENC_ATTN_REL_B, i), rel_bias_enc

Add T5Gemma support #14940 #15123

Are you sure you want to change the base?

Add T5Gemma support #14940 #15123

Uh oh!

Conversation

baonudesifeizhai commented Aug 6, 2025

Uh oh!

compilade Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

baonudesifeizhai Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

compilade Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

compilade Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

compilade Aug 6, 2025 •

edited

Loading