Hi,
Apologies if this subthread is incorrect for such suggestions.
I am trying to get this example posted to work but it seems that there are a few issues.
Example:
from transformers import TapasTokenizer, TapasForMaskedLM
import pandas as pd
tokenizer = TapasTokenizer.from_pretrained('google/tapas-base')
model = TapasForMaskedLM.from_pretrained('google/tapas-base')
data = {'Actors': ["Brad Pitt", "Leonardo Di Caprio", "George Clooney"],
'Age': ["56", "45", "59"],
'Number of movies': ["87", "53", "69"]
}
table = pd.DataFrame.from_dict(data)
inputs = tokenizer(table=table, queries="How many [MASK] has George [MASK] played in?", return_tensors="pt")
labels = tokenizer(table=table, queries="How many movies has George Clooney played in?", return_tensors="pt")["input_ids"]
outputs = model(**inputs, labels=labels)
last_hidden_states = outputs.last_hidden_state
Particularly, there seems to be a mismatch in dimensionality between inputs and labels causing the following ValueError
:
ValueError: Expected input batch_size (32) to match target batch_size (34).
I think this arises from the EOS and CLS tags but this is a quick guess.
Additionally, if I adjust the labels/ inputs to have equal dimensions then I get a second error when retrieving the last hidden state:
AttributeError: 'MaskedLMOutput' object has no attribute 'last_hidden_state'
Could you please advise on what the expected behaviour is?
Thanks