Num_samples = 0, dataset not being read

smahorker · December 7, 2023, 5:34am

I’m not really sure what the issue is with my dataset, or if its the argument im passing to autotrain. I’ve looked at other datasets(alpaca gpt4) and it looks correctly organized.

mikehemberger · December 7, 2023, 11:59am

Hi there @smahorker,
I think the problem is your dataset formatting. Looking into the .csv file indicates to me that not all of your strings were parsed correctly, ie missing the “ “

But I can be wrong as I haven’t worked with that interface before. Hope it helps
Best,
Mike

smahorker · December 7, 2023, 1:12pm

Hi,

Thank you for the response.

I’ve updated my dataset so that human input was also in " ", which seems to have uniformed my csv. However I’m still facing the same issue. I’m not sure why my datasets formatting is off and the autotrainer runs fine on other data.

mikehemberger · December 7, 2023, 3:03pm

I would recommend to first read-in your dataset, ie in a Jupyter notebook and try to get some data from your dataset.
Looking at your csv file again, it still doesnt look right to me.

I would imagine that:

ds=load_dataset(“smahorker/discllm”)

Will throw an error due to the formatting issues.

mikehemberger · December 7, 2023, 10:05pm

Maybe it would help to understand the exact format that is required by the Llama2 model here.

Then I would take a step back and check

data is loaded in correctly (printing, visualize)
search for the Llama2 format, update your dataset (if necessary). Then send an example of your data to the model via transformers

IMG_49982210×906 189 KB
check that the models output tensor shape aligns with what is expected
Good luck!

Topic		Replies	Views
Train huggingface Beginners	2	391	November 10, 2023
Error in Autotrain Training Beginners	3	104	May 8, 2025
Training stops while fine-tuning Llama2-7B with AutoTrain Advancedvanced Beginners	0	420	August 16, 2023
Training data is not working Beginners	4	169	November 18, 2024
Autotrain Trainings Tab: Error Loading Dataset 🤗AutoTrain	1	551	October 26, 2023

Num_samples = 0, dataset not being read

Related topics