What model-pairs are supported by the assistant decoding generation in Huggingface AutoModelForCausalLM?

alvations · March 13, 2024, 7:53pm

The assistant decoding model as described in Assisted Generation: a new direction toward low-latency text generation is implemented in Generate: Add assisted generation by gante · Pull Request #22211 · huggingface/transformers · GitHub

Q Part 1. What model-pairings are known to be supported by the model.generate(..., assistant_model='') feature?

Q Part 2. Does it work for decoder-only model too? Anyone tried any pairs of decoder-only models available on the huggingface hub?

The assumption for the assistant decoding model are:

the tokenizer must be the same for assistant and main model
the model is supported by AutoModelForCausalLM

from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = 'EleutherAI/pythia-1.4b-deduped'
assistant = 'EleutherAI/pythia-160m-deduped'

tokenizer = AutoTokenizer.from_pretrained(checkpoint) #, bos_token_id=101, eos_token_id=102)
model = AutoModelForCausalLM.from_pretrained(checkpoint) #, bos_token_id=101, eos_token_id=102)

assistant_model = AutoModelForCausalLM.from_pretrained(assistant)

tokenized_inputs = tokenizer("Alice and Bob", return_tensors="pt")

outputs = model.generate(**tokenized_inputs, assistant_model=assistant_model)

tokenizer.batch_decode(outputs, skip_special_tokens=True)

I’ve tried the following and this works:

EleutherAI/pythia-1.4b-deduped + EleutherAI/pythia-160m-deduped

But these didn’t:

google-bert/bert-large-uncased + google-bert/bert-base-uncased (also had to add , bos_token_id=101, eos_token_id=102) to the model and/or tokenizer initialization to avoid None type when assistant model is scoping down the vocabulary)
FacebookAI/xlm-roberta-large + FacebookAI/xlm-roberta-base (ended up with TypeError: object of type 'NoneType' has no len() error when looking for candidate generation)

alvations · March 13, 2024, 7:54pm

Also asked on python - What model-pairs are supported by the assistant decoding generation in Huggingface AutoModelForCausalLM? - Stack Overflow and Generate: Add assisted generation by gante · Pull Request #22211 · huggingface/transformers · GitHub

Topic		Replies	Views
Speculative Decoding with Qwen Models 🤗Transformers	1	302	March 5, 2025
Questions on model's tokens 🤗Tokenizers	0	600	March 24, 2021
How can I use the models provided in huggingface.co/models? Beginners	3	1560	April 9, 2021
AutoModelForCausalLM and transformers.pipeline Beginners	2	618	August 29, 2024
Create an Assistant to be used via Python scripts Beginners	13	391	September 22, 2024

What model-pairs are supported by the assistant decoding generation in Huggingface AutoModelForCausalLM?

Related topics