Valueerror "too many rows" with Tapas/TableQuestionAnswering pipeline - How to fix it?

Hi guys! :wave:

I wanted to query a dataframe via the "table-question-answering" pipeline. It works well with small dataframes, however as soon as I import larger dataframes (e.g. with ~400 rows), I’ve got the following issue:

valueerror "too many rows"

Any idea what may be happening here?

Thanks in advance :pray:

Charly

pinging @lysandre

He’s on vacation so you might have to wait for to weeks :wink: Looking at the code, you past more rows than allowed by tokenizer.max_row_id, sp you should send a shorter table.
There also seems to be an option drop_rows_to_fit=True that you can pass to avoid this error.

1 Like

Thank you Sylvain! I’ll give it a whirl! :pray:

Hello @charly, Hi @sgugger.

Could you help me with how to work with multiple tables as the data dump (the dump from where the answers need to come.)
I have fine tuned the TAPAS model with the QA csv sheet and now trying to ask question and get answers.

import json
import pandas as pd
with open('/content/mydata.json') as f:
    d = json.load(f)
table = pd.DataFrame.from_dict(d, orient='index')
table = table.astype(str)
inputs = tokenizer(table=table, queries=queries, padding='max_length', return_tensors="pt")

I’ve been trying to work with the above code but this seems to combine every individual table into one dataframe, which leads to the “too many rows” error.

The json file data is as below (example) :

[ {'meters': ["<co>"], 
        'D type': ["PO"],
        'Des': ["Value that is."],
        'instruc': ["Add accum" ] } ,
 {'meters': ["<co>"], 
        'D type': ["PO"],
        'Des': ["register 1."],
        'instruc': ["accumulator"]}
]

Most of the examples everywhere use just a single table to showcase the inference step.

Example : data = {"Actors": ["Brad Pitt", "Leonardo Di Caprio", "George Clooney"], "Number of movies": ["87", "53", "69"]}

If the above example is a single table, I have 1000 such tables to get my answer from.

Please Help !!!
Thanks in advance !!!

what is the limit for thus model? I keep getting the same ValueError even with drop_rows_to_fit=True on the tokenizer.

I’m seeing above in the original comment here, that someone got the “Too many rows” error with a… 400 row length table? surely that can’t be correct… is that true? if so, how does one make this model usable for tables tens of thousands of rows?

This is indeed true, @zadamg !
I have been trying this for a couple of days and I am only able to use a CSV with 100 rows (4 columns).
I have also created a fresh topic on this one. Hopefully someone will help out.