Applying Tapas/TableQuestionAnswering pipelines on a csv via Pandas?

charly · December 18, 2020, 10:19am

Hi guys!

Great work on Tapas and the v4.1.1 releaser!

Is there any guidance on how to apply this pipeline to dataframes uploaded via pandas.read_csv?

Thanks,
Charly

lysandre · December 18, 2020, 2:31pm

Hello! Here’s how I would setup a pipeline with a pd.DataFrame

from transformers import pipeline
import pandas as pd

tqa_pipeline = pipeline("table-question-answering")

data = {
    "Repository": ["Transformers", "Datasets", "Tokenizers"],
    "Stars": ["36542", "4512", "3934"],
    "Contributors": ["651", "77", "34"],
    "Programming language": ["Python", "Python", "Rust, Python and NodeJS"],
}

queries = "What repository has the largest number of stars?"
table = pd.DataFrame.from_dict(data)

output = tqa_pipeline(table, queries)
# {'answer': 'Transformers', 'coordinates': [(0, 0)], 'cells': ['Transformers']}

If you want to use a CSV file, you also can; here’s the previous example converted to CSV and saved in ~/pipeline.csv:

Repository,Stars,Contributors,Programming language
Transformers,36542,651,Python
Datasets,4512,77,Python
Tokenizers,3934,34,"Rust, Python and NodeJS"

Here’s how I would do (note the type conversion):

from transformers import pipeline
import pandas as pd

tqa_pipeline = pipeline("table-question-answering")

queries = "What repository has the largest number of stars?"
# Convert everything to a string, as the tokenizer can only handle strings
table = pd.read_csv("~/pipeline.csv").astype(str)

output = tqa_pipeline(table, queries)
# {'answer': 'Transformers', 'coordinates': [(0, 0)], 'cells': ['Transformers']}

Hope that helps!

charly · December 18, 2020, 11:13pm

That’s incredibly useful, thanks @lysandre!

I’m playing with several datasets and I have to say that getting the right answers is sometimes challenging.

Is there any guidance anywhere about a possible syntax/how to do basic operations? (e.g. mean, median etc…)

Thanks again!
Charly

Topic		Replies	Views
"table-question-answering" is not an available task under pipeline Beginners	6	2715	January 21, 2021
Valueerror "too many rows" with Tapas/TableQuestionAnswering pipeline - How to fix it? 🤗Transformers	6	1498	June 15, 2023
Apply multiple rows of pandas dataframe to text2text-generation pipeline 🤗Transformers	0	570	July 11, 2022
Proper use of TAPAS model? Beginners	2	837	May 5, 2022
Pipeline inference with Dataset api 🤗Transformers	5	12029	November 15, 2023

Applying Tapas/TableQuestionAnswering pipelines on a csv via Pandas?

Related topics