Training data is not working

Hey so I’ve been trying to train an AI using autotrain, and I’ve been stuck on this for a while now I’ve read the documentation I’m using sft I am making a jsonl file with this inside of it

[{“content”: “hello”, “role”: “user”}, {“content”: “hi nice to meet you”, “role”: “assistant”}]
[{“content”: “how are you”, “role”: “user”}, {“content”: “I am fine”, “role”: “assistant”}]
[{“content”: “What is your name?”, “role”: “user”}, {“content”: “My name is Mary”, “role”: “assistant”}]
[{“content”: “Which is the best programming language?”, “role”: “user”}, {“content”: “Python”, “role”: “assistant”}]

straight from the documentation just to see if it’s working correctly before I format lots of data myself I’ve also tried all other examples in the documentation and every time I click the train button it will start a spinning circle but nothing happens if I check the logs nothing is there, and I don’t know what to do I’m using all default settings so what I just tried is I went onto hugging faces website, and I’ve tried multiple different AI to train off of and multiple different datasets but when I try the datasets other people have made I always get errors saying that it expected x but got x instead, and I just want to know what I’m doing wrong I can not create my own dataset or copy one and try it because I always run into an issue the main goal here is to make my own to train the AI, but even the examples from the documentation do not work :frowning:

1 Like

Hello. We can’t give you any specific advice unless we know which model and dataset is causing the error…
If you don’t put the data set in the HF data set repository, autotrain won’t work properly.
Basically, that error occurs when the program is not passed the number of types of data it expects.
As a general rule, I think it’s easier to get it to work if you first try running a very simple sample that seems to work without a problem, and then replace it with your own model and data.

It seems that this can also be caused by bugs or insufficient performance in the library, and in such cases it is difficult to solve the problem on your own.

Thanks for responding it’s not one specific model or dataset it seems to be everyone I use but what do you mean I need to add it to the repository I was uploading it from my computer

1 Like

Of course, you can train models using only your local environment with the HF library or other companies’ libraries, but HF also provides a way to train models to some extent automatically.
Specifically, it is as follows.
And since these are designed to use data uploaded to the Hub, it is often easier to upload them once.
If you set them to private, they will not be visible to other users, so there is no need to publish the uploaded data to the whole world.
Even if you do everything locally, it is still useful to organize your data in this format for HF.
The libraries created by HF generally use fixed file names and file types, so if you organize your data accordingly, it will generally work.

Alright ill check it out thanks

1 Like