I have fine-tuned a summarization model following the Hugging Face seq2seq guide (starting from sshleifer/distilbart-xsum-12-6).
Our team is interested in using AWS elastic inference for deployment for cost reduction. (e.g. similar to this https://aws.amazon.com/blogs/machine-learning/fine-tuning-a-pytorch-bert-model-and-deploying-it-with-amazon-elastic-inference-on-amazon-sagemaker/)
I was wondering whether there’s any examples or any suggested way to use the beam searching logic in BartForConditionalGeneration with model inference from a torchscript model. Most of the examples for torchscript I’ve found are with classification tasks where this isn’t necessary.