Other aggregation on TAPAS beyond (SUM/COUNT/AVERAGE/NONE)

In the current way to fine-tune the model, is it possible to train TAPAS to learn other aggregations such difference, percentages etc ?

If it is possible, can you please point to some documentation?


Yes it is possible to train TAPAS on other custom aggregations. You can change the number of aggregation operators in TapasConfig, like so:

from transformers import TapasConfig

config = TapasConfig(num_aggregation_heads=10)

and then initialize a TapasForQuestionAnswering model with a pre-trained base and your custom head on top:

from transformers import TapasForQuestionAnswering

model = TapasForQuestionAnswering.from_pretrained('google/tapas-base', config=config)

For more information, see the fine-tuning guide of TAPAS here.

1 Like

Thank you. That helps.

1 Like


I tried changing the num_aggregation_labels and added the aggregation_labels column to the dataset

config = TapasConfig(num_aggregation_labels=3,
                 use_answer_as_supervision = True,
                 cell_selection_preference = 0.207951,
                 aggregation_labels = {0: "NONE", 1: "DIFF",2: "PERCENT"})

Can’t find how to resolve this error

RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 1

Went through the documentation too. Do I need to add the DIFF and PERCENT aggregation calculation somewhere?

Can you provide some more details as to where this error happens?


I get the error in the modelling_tapas.py

 File "C:\Users\Kinjal\.conda\envs\nlp\lib\site-packages\transformers\models\tapas\modeling_tapas.py", line 2254, in _calculate_expected_result
expected_result = torch.sum(all_results * aggregation_op_only_probs, dim=1)

RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension

Hmm yeah that’s because the calculate_expected_result function is based on the 3 aggregation operators on which TAPAS was fine-tuned (SUM, COUNT and AVERAGE) as you can see here. So if you want to fine-tune with weak supervision, you would actually need to adapt the calculate_expected_result function.

Another option could be to use strong supervision for aggregation (i.e. providing the ground truth operator during training). In that case, the calculate_expected_result function is not required.