Other aggregation on TAPAS beyond (SUM/COUNT/AVERAGE/NONE)

In the current way to fine-tune the model, is it possible to train TAPAS to learn other aggregations such difference, percentages etc ?

If it is possible, can you please point to some documentation?


Yes it is possible to train TAPAS on other custom aggregations. You can change the number of aggregation operators in TapasConfig, like so:

from transformers import TapasConfig

config = TapasConfig(num_aggregation_heads=10)

and then initialize a TapasForQuestionAnswering model with a pre-trained base and your custom head on top:

from transformers import TapasForQuestionAnswering

model = TapasForQuestionAnswering.from_pretrained('google/tapas-base', config=config)

For more information, see the fine-tuning guide of TAPAS here.

1 Like

Thank you. That helps.

1 Like


I tried changing the num_aggregation_labels and added the aggregation_labels column to the dataset

config = TapasConfig(num_aggregation_labels=3,
                 use_answer_as_supervision = True,
                 cell_selection_preference = 0.207951,
                 aggregation_labels = {0: "NONE", 1: "DIFF",2: "PERCENT"})

Can’t find how to resolve this error

RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 1

Went through the documentation too. Do I need to add the DIFF and PERCENT aggregation calculation somewhere?

Can you provide some more details as to where this error happens?


I get the error in the modelling_tapas.py

 File "C:\Users\Kinjal\.conda\envs\nlp\lib\site-packages\transformers\models\tapas\modeling_tapas.py", line 2254, in _calculate_expected_result
expected_result = torch.sum(all_results * aggregation_op_only_probs, dim=1)

RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension

Hmm yeah that’s because the calculate_expected_result function is based on the 3 aggregation operators on which TAPAS was fine-tuned (SUM, COUNT and AVERAGE) as you can see here. So if you want to fine-tune with weak supervision, you would actually need to adapt the calculate_expected_result function.

Another option could be to use strong supervision for aggregation (i.e. providing the ground truth operator during training). In that case, the calculate_expected_result function is not required.

1 Like

Thanks for your reply. I am trying to use strong supervision. I have created a column with the “aggregation_label” indices. Now I am not sure how to pass the aggregation_labels to training. This is how I am trying to run it.

config = TapasConfig(num_aggregation_labels=3,
                  use_answer_as_supervision = True,
                  select_one_column = False,
                  cell_selection_preference = 0.1,
                  aggregation_labels = {0: "DIFF", 1: "PERCENT", 2: "SUM"},
model_training = TapasForQuestionAnswering.from_pretrained("google/tapas-base", config=config)

for epoch in range(3):  # loop over the dataset multiple times
for idx, batch in enumerate(train_dataloader):
    # get the inputs;
    input_ids = batch["input_ids"].to(device)
    attention_mask = batch["attention_mask"].to(device)
    token_type_ids = batch["token_type_ids"].to(device)
    labels = batch["labels"].to(device)
    # numeric_values = batch["numeric_values"].to(device)
    # numeric_values_scale = batch["numeric_values_scale"].to(device)
    # float_answer = batch["float_answer"].to(device)
    # aggregation_labels = Should I pass the aggregation_labels here, how then, batch[] won't work because the tokenizer doesn't return aggregation_labels ??
    # zero the parameter gradients
    # forward + backward + optimize        
    outputs = model_training(input_ids=input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids, labels = labels)
    loss = outputs.loss
    print("Loss:", loss.item())

When I run the above code, i get the following error.

File "C:\Users\Kinjal\.conda\envs\nlp\lib\site-packages\transformers\models\tapas\modeling_tapas.py", line 1312, in forward
raise ValueError(

ValueError: You have to specify aggregation labels in order to calculate the aggregation loss


in case you’re using strong supervision, you should set use_answer_as_supervision of TapasConfig to False (because the ground truth aggregation label is given during training).

You should add the aggregation label indices yourself when creating the batches. Do you have a small portion of the data? Then I can create a demo notebook

Hi Neils,

Sure I can send a sample. I am not able to upload it here though.

The forum also supports private messaging :slight_smile:

Here’s a Colab notebook illustrating how to fine-tune TAPAS with strong supervision for aggregation and custom aggregation operators: https://colab.research.google.com/drive/16M4Cdh4N6ywD6FqJZkwJhPMpZuK-KNji?usp=sharing

1 Like