-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
GPU:v100
cuda version: 12.2
Thanks for your great work. Now i wanted to deploy XLMRoberta with TensorRT-LLM, which is only has a tweak from the position_ids in bert_embeddings, so follow the issue i mentioned here, #363. @byshiue suggested me to pass position_ids as an input array to the bert forward function.
So i simply modify the original unittest file test_bert.py and pass position_ids as an input array to check if it is ok. I make 3 tests below.
-
the original unittest for test_bert.py it works well.
-
pass real data to the original unittest
In this test, i just use real data to replace the generated fake data, and modify hf_bert.forward function to use attention_masks for huggingface transformers model.
the core modification is here.
tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-large')
sentence_pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']]
device_hf = torch.device("cuda")
inputs_hf = tokenizer(sentence_pairs, padding=True, truncation=True, return_tensors='pt', max_length=512).to(device_hf)
and the result is error.

and the whole test_file is here. (just a test_bert.py, i cannot upload a single py file)
test_bert_with_real_data.zip
the core modification is here.
from transformers.models.xlm_roberta.modeling_xlm_roberta import create_position_ids_from_input_ids

and the whole test_file is here. (just a test_bert.py, i cannot upload a single py file)
test_bert_just_pass_position.zip
Can you help me and take a look at my problem? Looking forward to your replies, Thanks!
