Hi guys!
Great work on Tapas and the v4.1.1 releaser!
Is there any guidance on how to apply this pipeline to dataframes uploaded via pandas.read_csv
?
Thanks,
Charly
Hi guys!
Great work on Tapas and the v4.1.1 releaser!
Is there any guidance on how to apply this pipeline to dataframes uploaded via pandas.read_csv
?
Thanks,
Charly
Hello! Here’s how I would setup a pipeline with a pd.DataFrame
from transformers import pipeline
import pandas as pd
tqa_pipeline = pipeline("table-question-answering")
data = {
"Repository": ["Transformers", "Datasets", "Tokenizers"],
"Stars": ["36542", "4512", "3934"],
"Contributors": ["651", "77", "34"],
"Programming language": ["Python", "Python", "Rust, Python and NodeJS"],
}
queries = "What repository has the largest number of stars?"
table = pd.DataFrame.from_dict(data)
output = tqa_pipeline(table, queries)
# {'answer': 'Transformers', 'coordinates': [(0, 0)], 'cells': ['Transformers']}
If you want to use a CSV file, you also can; here’s the previous example converted to CSV and saved in ~/pipeline.csv
:
Repository,Stars,Contributors,Programming language
Transformers,36542,651,Python
Datasets,4512,77,Python
Tokenizers,3934,34,"Rust, Python and NodeJS"
Here’s how I would do (note the type conversion):
from transformers import pipeline
import pandas as pd
tqa_pipeline = pipeline("table-question-answering")
queries = "What repository has the largest number of stars?"
# Convert everything to a string, as the tokenizer can only handle strings
table = pd.read_csv("~/pipeline.csv").astype(str)
output = tqa_pipeline(table, queries)
# {'answer': 'Transformers', 'coordinates': [(0, 0)], 'cells': ['Transformers']}
Hope that helps!
That’s incredibly useful, thanks @lysandre!
I’m playing with several datasets and I have to say that getting the right answers is sometimes challenging.
Is there any guidance anywhere about a possible syntax/how to do basic operations? (e.g. mean, median etc…)
Thanks again!
Charly