In the current way to fine-tune the model, is it possible to train TAPAS to learn other aggregations such difference, percentages etc ?
If it is possible, can you please point to some documentation?
In the current way to fine-tune the model, is it possible to train TAPAS to learn other aggregations such difference, percentages etc ?
If it is possible, can you please point to some documentation?
Hi,
Yes it is possible to train TAPAS on other custom aggregations. You can change the number of aggregation operators in TapasConfig
, like so:
from transformers import TapasConfig
config = TapasConfig(num_aggregation_heads=10)
and then initialize a TapasForQuestionAnswering model with a pre-trained base and your custom head on top:
from transformers import TapasForQuestionAnswering
model = TapasForQuestionAnswering.from_pretrained('google/tapas-base', config=config)
For more information, see the fine-tuning guide of TAPAS here.
Thank you. That helps.
Hi,
I tried changing the num_aggregation_labels and added the aggregation_labels column to the dataset
config = TapasConfig(num_aggregation_labels=3,
use_answer_as_supervision = True,
cell_selection_preference = 0.207951,
aggregation_labels = {0: "NONE", 1: "DIFF",2: "PERCENT"})
Can’t find how to resolve this error
RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 1
Went through the documentation too. Do I need to add the DIFF and PERCENT aggregation calculation somewhere?
Can you provide some more details as to where this error happens?
Hi,
I get the error in the modelling_tapas.py
File "C:\Users\Kinjal\.conda\envs\nlp\lib\site-packages\transformers\models\tapas\modeling_tapas.py", line 2254, in _calculate_expected_result
expected_result = torch.sum(all_results * aggregation_op_only_probs, dim=1)
RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension
Hmm yeah that’s because the calculate_expected_result
function is based on the 3 aggregation operators on which TAPAS was fine-tuned (SUM, COUNT and AVERAGE) as you can see here. So if you want to fine-tune with weak supervision, you would actually need to adapt the calculate_expected_result
function.
Another option could be to use strong supervision for aggregation (i.e. providing the ground truth operator during training). In that case, the calculate_expected_result
function is not required.
Thanks for your reply. I am trying to use strong supervision. I have created a column with the “aggregation_label” indices. Now I am not sure how to pass the aggregation_labels to training. This is how I am trying to run it.
config = TapasConfig(num_aggregation_labels=3,
use_answer_as_supervision = True,
select_one_column = False,
cell_selection_preference = 0.1,
aggregation_labels = {0: "DIFF", 1: "PERCENT", 2: "SUM"},
)
model_training = TapasForQuestionAnswering.from_pretrained("google/tapas-base", config=config)
for epoch in range(3): # loop over the dataset multiple times
print("Epoch:",epoch)
for idx, batch in enumerate(train_dataloader):
# get the inputs;
input_ids = batch["input_ids"].to(device)
attention_mask = batch["attention_mask"].to(device)
token_type_ids = batch["token_type_ids"].to(device)
labels = batch["labels"].to(device)
# numeric_values = batch["numeric_values"].to(device)
# numeric_values_scale = batch["numeric_values_scale"].to(device)
# float_answer = batch["float_answer"].to(device)
# aggregation_labels = Should I pass the aggregation_labels here, how then, batch[] won't work because the tokenizer doesn't return aggregation_labels ??
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = model_training(input_ids=input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids, labels = labels)
loss = outputs.loss
print("Loss:", loss.item())
loss.backward()
optimizer.step()
When I run the above code, i get the following error.
File "C:\Users\Kinjal\.conda\envs\nlp\lib\site-packages\transformers\models\tapas\modeling_tapas.py", line 1312, in forward
raise ValueError(
ValueError: You have to specify aggregation labels in order to calculate the aggregation loss
Hi,
in case you’re using strong supervision, you should set use_answer_as_supervision
of TapasConfig
to False
(because the ground truth aggregation label is given during training).
You should add the aggregation label indices yourself when creating the batches. Do you have a small portion of the data? Then I can create a demo notebook
Hi Neils,
Sure I can send a sample. I am not able to upload it here though.
The forum also supports private messaging
Here’s a Colab notebook illustrating how to fine-tune TAPAS with strong supervision for aggregation and custom aggregation operators: https://colab.research.google.com/drive/16M4Cdh4N6ywD6FqJZkwJhPMpZuK-KNji?usp=sharing
@ nielsr the colab is no longer exists, can you please re-share?
@nielsr is possible to read the notebook you shared which is no longer available? Many thanks