Example for TAPAS MaskedLM fails to run

Hi,

Apologies if this subthread is incorrect for such suggestions.

I am trying to get this example posted to work but it seems that there are a few issues.

Example:

from transformers import TapasTokenizer, TapasForMaskedLM
import pandas as pd

tokenizer = TapasTokenizer.from_pretrained('google/tapas-base')
model = TapasForMaskedLM.from_pretrained('google/tapas-base')

data = {'Actors': ["Brad Pitt", "Leonardo Di Caprio", "George Clooney"],
        'Age': ["56", "45", "59"],
        'Number of movies': ["87", "53", "69"]
}
table = pd.DataFrame.from_dict(data)

inputs = tokenizer(table=table, queries="How many [MASK] has George [MASK] played in?", return_tensors="pt")
labels = tokenizer(table=table, queries="How many movies has George Clooney played in?", return_tensors="pt")["input_ids"]

outputs = model(**inputs, labels=labels)
last_hidden_states = outputs.last_hidden_state

Particularly, there seems to be a mismatch in dimensionality between inputs and labels causing the following ValueError:

ValueError: Expected input batch_size (32) to match target batch_size (34).

I think this arises from the EOS and CLS tags but this is a quick guess.

Additionally, if I adjust the labels/ inputs to have equal dimensions then I get a second error when retrieving the last hidden state:

AttributeError: 'MaskedLMOutput' object has no attribute 'last_hidden_state'

Could you please advise on what the expected behaviour is?

Thanks

1 Like