How to label dataset for Causal Language Modeling

Hello, I was wondering if labeling my dataset would lead to better results in the fine tuning of a causal model, I have seen several code examples in which they labelled and others where they don’t.
I went into the source code for GPTNeoForCausalLM forward function

if labels is not None:
            # Compute loss in fp32 to match with mesh-tf version
            lm_logits =

            # Shift so that tokens < n predict n
            shift_logits = lm_logits[..., :-1, :].contiguous()
            shift_labels = labels[..., 1:].contiguous()
            # Flatten the tokens
            loss_fct = CrossEntropyLoss()
            loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))

            lm_logits =
            loss =

So If I want to use labels can I simply copy the input_ids for the labels ? Or do I need to worry about BOS token and stuff… Thank you