Valueerror "too many rows" with Tapas/TableQuestionAnswering pipeline - How to fix it?

charly · December 20, 2020, 4:57pm

Hi guys!

I wanted to query a dataframe via the "table-question-answering" pipeline. It works well with small dataframes, however as soon as I import larger dataframes (e.g. with ~400 rows), I’ve got the following issue:

valueerror "too many rows"

Any idea what may be happening here?

Thanks in advance

Charly

valhalla · December 21, 2020, 7:40am

pinging @lysandre

sgugger · December 21, 2020, 6:03pm

He’s on vacation so you might have to wait for to weeks Looking at the code, you past more rows than allowed by tokenizer.max_row_id, sp you should send a shorter table.
There also seems to be an option drop_rows_to_fit=True that you can pass to avoid this error.

charly · December 21, 2020, 8:20pm

Thank you Sylvain! I’ll give it a whirl!

Arka01 · August 28, 2022, 10:53am

Hello @charly, Hi @sgugger.

Could you help me with how to work with multiple tables as the data dump (the dump from where the answers need to come.)
I have fine tuned the TAPAS model with the QA csv sheet and now trying to ask question and get answers.

import json
import pandas as pd
with open('/content/mydata.json') as f:
    d = json.load(f)
table = pd.DataFrame.from_dict(d, orient='index')
table = table.astype(str)
inputs = tokenizer(table=table, queries=queries, padding='max_length', return_tensors="pt")

I’ve been trying to work with the above code but this seems to combine every individual table into one dataframe, which leads to the “too many rows” error.

The json file data is as below (example) :

[ {'meters': ["<co>"], 
        'D type': ["PO"],
        'Des': ["Value that is."],
        'instruc': ["Add accum" ] } ,
 {'meters': ["<co>"], 
        'D type': ["PO"],
        'Des': ["register 1."],
        'instruc': ["accumulator"]}
]

Most of the examples everywhere use just a single table to showcase the inference step.

Example : data = {"Actors": ["Brad Pitt", "Leonardo Di Caprio", "George Clooney"], "Number of movies": ["87", "53", "69"]}

If the above example is a single table, I have 1000 such tables to get my answer from.

Please Help !!!
Thanks in advance !!!

zadamg · November 18, 2022, 5:36am

what is the limit for thus model? I keep getting the same ValueError even with drop_rows_to_fit=True on the tokenizer.

I’m seeing above in the original comment here, that someone got the “Too many rows” error with a… 400 row length table? surely that can’t be correct… is that true? if so, how does one make this model usable for tables tens of thousands of rows?

chem1 · June 15, 2023, 6:40am

This is indeed true, @zadamg !
I have been trying this for a couple of days and I am only able to use a CSV with 100 rows (4 columns).
I have also created a fresh topic on this one. Hopefully someone will help out.

Topic		Replies	Views
TAPAS fine-tuning 🤗Transformers	0	303	July 26, 2023
How to overcome TAPAS Model tapas-large-finetuned-wtq max rows limitation issue Models	0	193	February 27, 2024
Applying Tapas/TableQuestionAnswering pipelines on a csv via Pandas? Beginners	2	1407	December 18, 2020
Using TAPAS model for large datasets Models	2	423	February 29, 2024
AutoModelForQuestionAnswering: ValueError: too many values to unpack (expected 2) Models	0	458	February 22, 2023

Valueerror "too many rows" with Tapas/TableQuestionAnswering pipeline - How to fix it?

Related topics