Data Format while finetuning Llama2 for json extraction?

I have a task in mind where i want the Llama-2 to be finetuned to generate jsons in a particular format from sentences.
I have looked at several finetuning examples and datasets such as https://www.philschmid.de/sagemaker-llama2-qlora which use instruction, response and context to train the model.
What to put in my context? I tried putting the sentence in the instruction category and put the json format it should use in the context category. But in case context is same for all examples. I this correct? Is there an alternative way to do this?

1 Like

hello, did you get the result you wanted? i was trying to do something similar but failing.
basically, i wanted it to extract specific item names from text and then get me their unique codes taught to the model by finetuning. it somewhat does the extraction correctly but fetching the unique code is where i fail.

I did try adding the json it should fill in context and the actual sentence in the instruction, which worked for me(for most examples, not all).
You could add the unique code in context and the instruction would be the sentence and telling it to extract.