In Fastertransformer, t5 supports inputs_embeds as an input and gpt supports soft_prompt, it's necessary when using llm as part of a multimodal model.