i am doing image caption generation task,there are various implementation in different communities,which have their own improvements,i want to write a code that use the standard transformer structure,and i referenced the code in The Annotated Transformer ,
now i should implement the beam_search code for sequence generation,i noticed the transformer library has a implementation in the generation_utils.generate, and there’s a lot document about text generation,but they are all pre-trained models and directly use the model.generate to generate sequence, but how should i use the generation_utils.generate to generate caption on my own datasets based on the standard transformer structure? Is there any examples or tutorials that I can refer to?
thanks a lot.