Data Format while finetuning Llama2 for json extraction?

seinfeld · November 6, 2023, 2:18pm

I have a task in mind where i want the Llama-2 to be finetuned to generate jsons in a particular format from sentences.
I have looked at several finetuning examples and datasets such as https://www.philschmid.de/sagemaker-llama2-qlora which use instruction, response and context to train the model.
What to put in my context? I tried putting the sentence in the instruction category and put the json format it should use in the context category. But in case context is same for all examples. I this correct? Is there an alternative way to do this?

developer0313testing · November 10, 2023, 5:30pm

hello, did you get the result you wanted? i was trying to do something similar but failing.
basically, i wanted it to extract specific item names from text and then get me their unique codes taught to the model by finetuning. it somewhat does the extraction correctly but fetching the unique code is where i fail.

seinfeld · November 10, 2023, 9:02pm

I did try adding the json it should fill in context and the actual sentence in the instruction, which worked for me(for most examples, not all).
You could add the unique code in context and the instruction would be the sentence and telling it to extract.

Topic		Replies	Views
Data Format for finetuning Llama2 to extract json Amazon SageMaker	0	2288	October 23, 2023
Formatting Inference API call for LLama 2 Inference Endpoints on the Hub	3	11820	November 23, 2023
Finetune BERT for information extraction Beginners	0	1815	June 6, 2022
Best practice for finetune LLM Intermediate	0	668	June 21, 2023
What is the text dataset format for fintune LLM? Beginners	2	2751	June 8, 2023

Data Format while finetuning Llama2 for json extraction?

Related topics