I would like to extract the features for a text using the “facebook/galactica-6.7b” model for use as features in a downstream prediction model. Following the pipeline example, I’m able to extract embeddings using the “allenai/scibert_scivocab_uncased” without issue:
from transformers import pipeline
extractor = pipeline(model="allenai/scibert_scivocab_uncased", task="feature-extraction")
input_text = """Here is some text. It has a few sentences."""
result = extractor(input_text, return_tensors=True)
I get a Tensor tape of size [1,13,768] just as expected.
However, if I try the same with the “facebook/galactica-6.7b” model I get an error:
from transformers import pipeline
extractor = pipeline(model="facebook/galactica-6.7b", task="feature-extraction")
input_text = """Here is some text. It has a few sentences."""
result = extractor(input_text, return_tensors=True)
TypeError: forward() got an unexpected keyword argument ‘token_type_ids’
Something is different with the galactica model but I’m not sure how to troubleshoot. I’ve looked at the model card and the original github repo but I can’t find instructions on extracting the text embeddings.