I am a writer and a complete beginner with huggingface/machine learning. I am interested in finetuning an LLM on my own published books so that I can ask it to generate text that appears to be written in my own style based on a simple prompt. I managed to partially do this using the gpt_2_simple python framework installed on my own computer. I trained it on a massive plain text file of 3 of my own novels concatenated together. It produces text in my style but it’s nonsense and I can only give it an initial text prompt which it then completes. I would like to be able to do this via interactive chat with prompts like “write a paragraph about a cat” etc. I’ve been trying to follow the LLM Finetuning tutorial here:
I think this tutorial is not really aimed at achieving exactly what I’m after though. I did manage to set up an autotraining session with my writing as the ‘generic mode’ data input. I did this by pasting the text into a giant csv file with just one column labeled ‘text’. But after running for about 30 seconds it stopped and just says ‘error’ at the bottom of the page.
So, my questions:
- Is this even possible with huggingface autotraining?
- If so, which hub model should I use?
- And how should I format the data?
This is really just a kind of experiment for me. I’m playing around with machine learning out of curiosity and because I think it’s cool. Any help much appreciated!