Skip to content

Add T5Gemma support #14940 #15123

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

baonudesifeizhai
Copy link

Implement encoder-decoder architecture with rela tive attention bias, tensor mapping, and model conversion
only tested t5gemma-s-s-prefixlm-it due to memory limitations. Please correct us if there are any mistakes.
Make sure to read the contributing guidelines before submitting a PR

@github-actions github-actions bot added the python python script changes label Aug 6, 2025
Comment on lines 6449 to 6450
# Don't call super().__init__() because it tries to find standard layer count parameters
# that don't exist in T5Gemma models (they have encoder.num_hidden_layers instead)
Copy link
Collaborator

@compilade compilade Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the only reason, maybe instead call super().__init__() with a modified hparams?

hparams = kwargs.get("hparams") or ModelBase.load_hparams(args[0] if args else kwargs["dir_model"])
encoder_config = hparams.get("encoder", {})
hparams["num_hidden_layers"] = encoder_config.get["num_hidden_layers"]
kwargs["hparams"] = hparams
super().__init__(*args, **kwargs)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already changed all commit , this is my first time to convert model in to gguf, i just convert t5gemma-s-s-prefixlm-it ... not sure that will work on all t5gemma

for i in range(self.block_count):
# Encoder relative attention bias - shape should be (n_rel_attn_bkts, n_head)
rel_bias_enc = torch.zeros(n_rel_attn_bkts, n_head_enc, dtype=torch.float16)
yield f"enc.blk.{i}.attn_rel_b.weight", rel_bias_enc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use self.format_tensor_name if possible.

Suggested change
yield f"enc.blk.{i}.attn_rel_b.weight", rel_bias_enc
yield self.format_tensor_name(gguf.MODEL_TENSOR.ENC_ATTN_REL_B, i), rel_bias_enc

(I did not test this, but should probably work)

This also applies to the other places in this function where the output tensor names are hardcoded.

Comment on lines 6703 to 6708
# Dynamically set encoder's other parameters
for key, value in encoder_config.items():
if key not in ["max_position_embeddings", "hidden_size", "num_hidden_layers", "intermediate_size",
"num_attention_heads", "num_key_value_heads", "head_dim", "rms_norm_eps",
"sliding_window", "attn_logit_softcapping", "final_logit_softcapping",
"rope_theta", "attention_bias", "attention_dropout", "query_pre_attn_scalar", "vocab_size"]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of excluding keys, maybe enumerating the included keys could be more robust.

It would also avoid adding unexpected metadata which won't necessarily be used.

…ents (ggml-org#14940)

- Add T5Gemma model support with proper encoder-decoder architecture
- Use super().__init__() instead of manual initialization for better inheritance
- Use format_tensor_name() for consistent tensor naming
- Explicitly enumerate included keys instead of excluding keys
- Add proper type annotations for better type safety
- Fix all trailing whitespace issues
- Support relative attention bias tensors generation
- Handle T5Gemma-specific post-layer normalization tensors
- Implement proper tokenizer handling for BPE tokenizer
- Add comprehensive tensor mapping for all T5Gemma components
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants