Other aggregation on TAPAS beyond (SUM/COUNT/AVERAGE/NONE)

In the current way to fine-tune the model, is it possible to train TAPAS to learn other aggregations such difference, percentages etc ?

If it is possible, can you please point to some documentation?

Hi,

Yes it is possible to train TAPAS on other custom aggregations. You can change the number of aggregation operators in TapasConfig, like so:

from transformers import TapasConfig

config = TapasConfig(num_aggregation_heads=10)

and then initialize a TapasForQuestionAnswering model with a pre-trained base and your custom head on top:

from transformers import TapasForQuestionAnswering

model = TapasForQuestionAnswering.from_pretrained('google/tapas-base', config=config)

For more information, see the fine-tuning guide of TAPAS here.

1 Like

Thank you. That helps.

1 Like

Hi,

I tried changing the num_aggregation_labels and added the aggregation_labels column to the dataset

config = TapasConfig(num_aggregation_labels=3,
                 use_answer_as_supervision = True,
                 cell_selection_preference = 0.207951,
                 aggregation_labels = {0: "NONE", 1: "DIFF",2: "PERCENT"})

Can’t find how to resolve this error

RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 1

Went through the documentation too. Do I need to add the DIFF and PERCENT aggregation calculation somewhere?

Can you provide some more details as to where this error happens?

Hi,

I get the error in the modelling_tapas.py

 File "C:\Users\Kinjal\.conda\envs\nlp\lib\site-packages\transformers\models\tapas\modeling_tapas.py", line 2254, in _calculate_expected_result
expected_result = torch.sum(all_results * aggregation_op_only_probs, dim=1)

RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension

Hmm yeah that’s because the calculate_expected_result function is based on the 3 aggregation operators on which TAPAS was fine-tuned (SUM, COUNT and AVERAGE) as you can see here. So if you want to fine-tune with weak supervision, you would actually need to adapt the calculate_expected_result function.

Another option could be to use strong supervision for aggregation (i.e. providing the ground truth operator during training). In that case, the calculate_expected_result function is not required.

1 Like

Thanks for your reply. I am trying to use strong supervision. I have created a column with the “aggregation_label” indices. Now I am not sure how to pass the aggregation_labels to training. This is how I am trying to run it.

config = TapasConfig(num_aggregation_labels=3,
                  use_answer_as_supervision = True,
                  select_one_column = False,
                  cell_selection_preference = 0.1,
                  aggregation_labels = {0: "DIFF", 1: "PERCENT", 2: "SUM"},
                  )
model_training = TapasForQuestionAnswering.from_pretrained("google/tapas-base", config=config)

for epoch in range(3):  # loop over the dataset multiple times
 print("Epoch:",epoch)
for idx, batch in enumerate(train_dataloader):
    # get the inputs;
    input_ids = batch["input_ids"].to(device)
    attention_mask = batch["attention_mask"].to(device)
    token_type_ids = batch["token_type_ids"].to(device)
    labels = batch["labels"].to(device)
    # numeric_values = batch["numeric_values"].to(device)
    # numeric_values_scale = batch["numeric_values_scale"].to(device)
    # float_answer = batch["float_answer"].to(device)
    # aggregation_labels = Should I pass the aggregation_labels here, how then, batch[] won't work because the tokenizer doesn't return aggregation_labels ??
    # zero the parameter gradients
    optimizer.zero_grad()
    # forward + backward + optimize        
    outputs = model_training(input_ids=input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids, labels = labels)
    loss = outputs.loss
    print("Loss:", loss.item())
    loss.backward()
    optimizer.step()

When I run the above code, i get the following error.

File "C:\Users\Kinjal\.conda\envs\nlp\lib\site-packages\transformers\models\tapas\modeling_tapas.py", line 1312, in forward
raise ValueError(

ValueError: You have to specify aggregation labels in order to calculate the aggregation loss

Hi,

in case you’re using strong supervision, you should set use_answer_as_supervision of TapasConfig to False (because the ground truth aggregation label is given during training).

You should add the aggregation label indices yourself when creating the batches. Do you have a small portion of the data? Then I can create a demo notebook

Hi Neils,

Sure I can send a sample. I am not able to upload it here though.

The forum also supports private messaging :slight_smile:

Here’s a Colab notebook illustrating how to fine-tune TAPAS with strong supervision for aggregation and custom aggregation operators: https://colab.research.google.com/drive/16M4Cdh4N6ywD6FqJZkwJhPMpZuK-KNji?usp=sharing

1 Like