Language generation with torchscript model?

I have fine-tuned a summarization model following the Hugging Face seq2seq guide (starting from sshleifer/distilbart-xsum-12-6).

Our team is interested in using AWS elastic inference for deployment for cost reduction. (e.g. similar to this https://aws.amazon.com/blogs/machine-learning/fine-tuning-a-pytorch-bert-model-and-deploying-it-with-amazon-elastic-inference-on-amazon-sagemaker/)

I was wondering whether there’s any examples or any suggested way to use the beam searching logic in BartForConditionalGeneration with model inference from a torchscript model. Most of the examples for torchscript I’ve found are with classification tasks where this isn’t necessary.

I’ve had success with deploying a BartForConditionalGeneration model using SageMaker with EI.

Try:

model = BartForConditionalGeneration.from_pretrained(model_dir, torchscript=True)
1 Like

Thanks a lot for that reply @setu4993!

It looks really promising, we’ll give it a try

@laphangho Good luck!

To add a little more context: SageMaker wants a ScriptModule, not trace. Trace is not possible with .generate(), but script works fine. And to use script mode, saving the model in a different way (than the default .save_pretrained() method is not required since torchscript=True can simply be provided as an additional argument when creating the model object.