Other aggregation on TAPAS beyond (SUM/COUNT/AVERAGE/NONE)

In the current way to fine-tune the model, is it possible to train TAPAS to learn other aggregations such difference, percentages etc ?

If it is possible, can you please point to some documentation?

Hi,

Yes it is possible to train TAPAS on other custom aggregations. You can change the number of aggregation operators in TapasConfig, like so:

from transformers import TapasConfig

config = TapasConfig(num_aggregation_heads=10)

and then initialize a TapasForQuestionAnswering model with a pre-trained base and your custom head on top:

from transformers import TapasForQuestionAnswering

model = TapasForQuestionAnswering.from_pretrained('google/tapas-base', config=config)

For more information, see the fine-tuning guide of TAPAS here.

1 Like

Thank you. That helps.

1 Like

Hi,

I tried changing the num_aggregation_labels and added the aggregation_labels column to the dataset

config = TapasConfig(num_aggregation_labels=3,
                 use_answer_as_supervision = True,
                 cell_selection_preference = 0.207951,
                 aggregation_labels = {0: "NONE", 1: "DIFF",2: "PERCENT"})

Can’t find how to resolve this error

RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 1

Went through the documentation too. Do I need to add the DIFF and PERCENT aggregation calculation somewhere?

Can you provide some more details as to where this error happens?

Hi,

I get the error in the modelling_tapas.py

 File "C:\Users\Kinjal\.conda\envs\nlp\lib\site-packages\transformers\models\tapas\modeling_tapas.py", line 2254, in _calculate_expected_result
expected_result = torch.sum(all_results * aggregation_op_only_probs, dim=1)

RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension

Hmm yeah that’s because the calculate_expected_result function is based on the 3 aggregation operators on which TAPAS was fine-tuned (SUM, COUNT and AVERAGE) as you can see here. So if you want to fine-tune with weak supervision, you would actually need to adapt the calculate_expected_result function.

Another option could be to use strong supervision for aggregation (i.e. providing the ground truth operator during training). In that case, the calculate_expected_result function is not required.