How to fine-tune an LLM with AutoTrain?

Hi,
This document indicates that one can fine-tune an LLM with AutoTrain using CLM.
I have a dataset that is formatted as:
{ “instruction”: “xxx”, “input”: “yyy”, “output”: “zzz”} tuples.

When attempting to create a new AutoTrain project, I’m not sure which option to choose to be able to train the model with these tuples using CLM.

Any suggestions would be helpful. Thank you!

you seem to have a jsonl file, you can convert it to CSV using python:

import pandas as pd

# read the JSONL file into a pandas DataFrame
df = pd.read_json('input.jsonl', lines=True)

# write the DataFrame to a CSV file
df.to_csv('output.csv', index=False)

regarding column mapping, it seems almost the same example is provided as an example:

Sorry, I should have pasted this picture.
Which of the model types map to training the model with CLM?
image

llm is only available in autotrain advanced: AutoTrain

Are there specific resource requirements for the hugging face space environment in order to proceed with LLM Finetuning?

1 Like