Help with autotrain/LLM finetuning please

I am a writer and a complete beginner with huggingface/machine learning. I am interested in finetuning an LLM on my own published books so that I can ask it to generate text that appears to be written in my own style based on a simple prompt. I managed to partially do this using the gpt_2_simple python framework installed on my own computer. I trained it on a massive plain text file of 3 of my own novels concatenated together. It produces text in my style but it’s nonsense and I can only give it an initial text prompt which it then completes. I would like to be able to do this via interactive chat with prompts like “write a paragraph about a cat” etc. I’ve been trying to follow the LLM Finetuning tutorial here:

I think this tutorial is not really aimed at achieving exactly what I’m after though. I did manage to set up an autotraining session with my writing as the ‘generic mode’ data input. I did this by pasting the text into a giant csv file with just one column labeled ‘text’. But after running for about 30 seconds it stopped and just says ‘error’ at the bottom of the page.

So, my questions:

  1. Is this even possible with huggingface autotraining?
  2. If so, which hub model should I use?
  3. And how should I format the data?

This is really just a kind of experiment for me. I’m playing around with machine learning out of curiosity and because I think it’s cool. Any help much appreciated!

I am also a beginner and I find myself in kind of the same situation… let’s help one another :slight_smile:

I woud like to fine tune an LLM and I encounter an error after 9-10s once I click on ‘Create Project’.
I also managed to set up a personal space and loaded a dataset of only one big column in accordance with ‘generic mode’.

Some remarks :

  • you can access the details of the error by clicking on the logs buttong in the top left corner of your personal space

image

  • Could you please copy/paste the logs you get ?

  • Have you already set up a payment method or not ?

Thanks for your reply. I checked the logs and the error is:

File “/app/env/lib/python3.9/site-packages/requests/models.py”, line 1021, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://api.autotrain.huggingface.co/projects/80921/start_training

So, that doesn’t seem to be an issue with the input data or the configuration of the job. I have not set up a payment method but the job cost estimate was $0.00.

Did you get yours working?


My model isn’t training properly yet, but I’ve progressed beyond this step.

You must input a payment card to enable your model’s training.

Only one AutoTrain project is free (and only if your dataset contains fewer than 3000 rows and you include just one model).

I subscribed using the “Pro” offer, which resolved that specific error.

Please refer to:

Cheers,