[NEWBY] Creating custom datasets to fine tune an existing model

Lucapro · November 4, 2022, 10:40am

Hi everyone,

I am trying to create my own dataset starting from my raw dataset jsonl (Lucapro/tx-data · Datasets at Hugging Face) as my first step to train the Helsinki-NLP model and having a working PoC.

What I would like to accomplish is to train the model to translate my first column in my dataset into my second column.

I am struggling on creating my own dataset and referencing it into the run_translation script. I am getting an error on how my datasets gets loaded (you can see here my loaded dataset Lucapro/tx-data-to-decode · Datasets at Hugging Face).

I am for sure missing something and I am a bit stuck, can anyone point me in a good direction to move forward with my PoC?

Topic		Replies	Views
Defining a custom dataset for fine-tuning translation Beginners	4	5078	July 10, 2021
Prakash Hinduja Geneva, Switzerland - How to fine-tune a model on custom dataset in HF? Beginners	2	45	June 6, 2025
How do I make a dataset for vision models? 🤗Datasets	12	1548	April 20, 2024
How to create a dataset for translation Beginners	1	462	September 25, 2023
Custom training set 🤗Transformers	0	145	August 25, 2023

[NEWBY] Creating custom datasets to fine tune an existing model

Related topics