Hello, I was wondering if labeling my dataset would lead to better results in the fine tuning of a causal model, I have seen several code examples in which they labelled and others where they don’t.
I went into the source code for GPTNeoForCausalLM forward function
if labels is not None: # Compute loss in fp32 to match with mesh-tf version # https://github.com/EleutherAI/gpt-neo/blob/89ce74164da2fb16179106f54e2269b5da8db333/models/gpt2/gpt2.py#L179 lm_logits = lm_logits.to(torch.float32) # Shift so that tokens < n predict n shift_logits = lm_logits[..., :-1, :].contiguous() shift_labels = labels[..., 1:].contiguous() # Flatten the tokens loss_fct = CrossEntropyLoss() loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1)) lm_logits = lm_logits.to(hidden_states.dtype) loss = loss.to(hidden_states.dtype)
So If I want to use labels can I simply copy the input_ids for the labels ? Or do I need to worry about BOS token and stuff… Thank you