Error when Finetuning a Pegasus Student

Hi everyone, i’m trying to run finetune.py to distill a pegasus student (16-2) on XSUM.
The problem is probably in the encoding of the source data i guess:

File "D:\Repos\transformers\examples\seq2seq\utils.py", line 147, in get_char_lens
return [len(x) for x in Path(data_file).open().readlines()]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 4245: character maps to 
<undefined>

This is the command i launch:

python finetune.py --learning_rate=1e-4 --do_train --do_predict --n_val 1000 --val_check_interval 
0.25 --max_source_length 512 --max_target_length 56 --freeze_embeds --label_smoothing 0.1 -- 
adafactor --task summarization_xsum --data_dir xsum --train_batch_size=1 --eval_batch_size=1 -- 
output_dir distilpeg_xsum_sft_16_2 --num_train_epochs 6 --model_name_or_path 
distilpeg_xsum_16_2 --gpus 1

Maybe there are unsupported characters in the data?
Thanks in advance!