GPT-2 text generation, structure of evaluation set for compute_metrics

Hello everyone,

I´m currently reproducing the second task (generating articles from headline) of this tutorial: Text generation with GPT-2 - Model Differently
I understand that the ‘input_ids’ of the training data must be prepared in the the format ‘bos_token sep_token eos_token’. Now I want to add a compute_metrics function which will be called by the trainer and evaluates another set, thus the model has to predict the ‘content’ only given the ‘title’. How do I prepare the data for the evaluation set?
Is it just ‘bos_token sep_token’? Or has one to manipulate the ‘attention_mask’ as indicated here:
GPT2 for QA Pair Generation - #9 by valhalla?