In the current way to fine-tune the model, is it possible to train TAPAS to learn other aggregations such difference, percentages etc ?
If it is possible, can you please point to some documentation?
In the current way to fine-tune the model, is it possible to train TAPAS to learn other aggregations such difference, percentages etc ?
If it is possible, can you please point to some documentation?
Hi,
Yes it is possible to train TAPAS on other custom aggregations. You can change the number of aggregation operators in TapasConfig
, like so:
from transformers import TapasConfig
config = TapasConfig(num_aggregation_heads=10)
and then initialize a TapasForQuestionAnswering model with a pre-trained base and your custom head on top:
from transformers import TapasForQuestionAnswering
model = TapasForQuestionAnswering.from_pretrained('google/tapas-base', config=config)
For more information, see the fine-tuning guide of TAPAS here.
Thank you. That helps.
Hi,
I tried changing the num_aggregation_labels and added the aggregation_labels column to the dataset
config = TapasConfig(num_aggregation_labels=3,
use_answer_as_supervision = True,
cell_selection_preference = 0.207951,
aggregation_labels = {0: "NONE", 1: "DIFF",2: "PERCENT"})
Can’t find how to resolve this error
RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 1
Went through the documentation too. Do I need to add the DIFF and PERCENT aggregation calculation somewhere?
Can you provide some more details as to where this error happens?
Hi,
I get the error in the modelling_tapas.py
File "C:\Users\Kinjal\.conda\envs\nlp\lib\site-packages\transformers\models\tapas\modeling_tapas.py", line 2254, in _calculate_expected_result
expected_result = torch.sum(all_results * aggregation_op_only_probs, dim=1)
RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension
Hmm yeah that’s because the calculate_expected_result
function is based on the 3 aggregation operators on which TAPAS was fine-tuned (SUM, COUNT and AVERAGE) as you can see here. So if you want to fine-tune with weak supervision, you would actually need to adapt the calculate_expected_result
function.
Another option could be to use strong supervision for aggregation (i.e. providing the ground truth operator during training). In that case, the calculate_expected_result
function is not required.