Extracting token embeddings from pretrained language models

You can specify the tokenizer with the tokenizer argument and do what is suggested in the error message.

Here is an example based on the documentation.

tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModel.from_pretrained("gpt2")

tokenizer.add_special_tokens({'pad_token': '[PAD]'})
model.resize_token_embeddings(len(tokenizer))

pipe = pipeline('feature-extraction', model=model, tokenizer=tokenizer)

This works for me. Let me know if you have any other questions :slight_smile:

3 Likes