Say I have a batch of examples with fields of input_ids
of size m*n and bos_token_id
of size n. Is there a way that I could specify the bos_token_id for each example during the evaluation step when using generate
?
2 Likes
I’m also curious about this. @mralexis - did you ever work this out? It seems like a similar question was also asked here: M2M model finetuning on multiple language pairs which also had no reply.
1 Like
I think I managed to do this, but my way of doing it is really hacky and fragile so I wouldn’t recommend it. I’ve filed a feature request with the huggingface transformers team to improve this at https://github.com/huggingface/transformers/issues/15500
That feature request has a link to a Colab notebook with the code for how I did it
1 Like