I am trying to use the model robert-base-openai-detector from huggingface.
I get keyError csv when I try to use a csv file. Whenever I use a text file and change load_dataset and KeyDataset to text it works.
Do I need to do some preprocessing before or what am I doing wrong?
pipe = pipeline("text-classification", model="roberta-base-openai-detector")
dataset = load_dataset('csv', data_files=["/content/training testing split with college address.xlsx - Sheet1.csv"], split='train')
for out in tqdm(pipe(KeyDataset(dataset, "csv"))):
The error stack doesn’t give me much information.
KeyError Traceback (most recent call last)
<ipython-input-5-98a64dbb56de> in <cell line: 10>()
---> 10 for out in tqdm(pipe(KeyDataset(dataset, "csv"))):
11 print (out)
/usr/local/lib/python3.10/dist-packages/transformers/pipelines/pt_utils.py in __getitem__(self, i)
304 def __getitem__(self, i):
--> 305 return self.dataset[i][self.key]
load_dateset will accept the dataset name which is either uploaded in the hugging face datasets or you pass to a directory where it will process the file in some format which contains some perocssing file .
See here for more documentation
What is the name of the text column in your CSV file? You can fix this error by replacing “csv” with that column name when initializing the
what if I’m passing several columns and not just one column?
You need to combine these columns into a single text column before using the pipeline.
If it has to be a single column before using the pipeline is there a way that I can process each row of the file and get a result for that row?
if I have a file that is:
I want to pass just line1 first, get result then pass line2, get result and so on. Thanks for the help so far!
You can use the built-in
text builder to read the file line by line:
from datasets import load_dataset
ds = load_dataset("text", data_files="path/to/csv", split="train")
ds = ds.select(range(1, len(ds))) # remove the csv header
Another option is to use
map to merge the columns.