SFTTrainer for Llama-2

I am a bit confused with the way SFFTrainer is used for finetuning a LLM. Take for example Llama-2.

Approach-1 Link

model_id= "NousResearch/Llama-2-7b-hf"
dataset = load_dataset("mlabonne/mini-platypus", split="train")

The dataset has two fields [‘instruction’, ‘output’]. The instruction is formatted text as …

Instruction:

Text

Response:

And the output is just the required textual response. When the SFTTrainer is instantiated, there is only reference to the instruction field

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    eval_dataset=dataset,
    peft_config=peft_config,
    dataset_text_field="instruction",
    max_seq_length=512,
    tokenizer=tokenizer,
    args=training_arguments,
)

there is no train eval split in the dataset , still the author feeds the same dataset in train_dataset and eval_dataset attributes.

Approach-2 Link

model_id = "NousResearch/llama-2-7b-chat-hf"
dataset_name = "mlabonne/guanaco-llama2-1k"

In this case the dataset has only one filed text and it is formatted as

image

This time when the SFTTrainer is instantiated , there’s reference only to the text field.

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=packing,
)

So the question is , how does the trainer know to look for the “output” field in the first case and how does the trainer know , that the entire data is in “text” field.