Language generation with torchscript model?

laphangho · August 11, 2020, 2:32am

I have fine-tuned a summarization model following the Hugging Face seq2seq guide (starting from sshleifer/distilbart-xsum-12-6).

Our team is interested in using AWS elastic inference for deployment for cost reduction. (e.g. similar to this https://aws.amazon.com/blogs/machine-learning/fine-tuning-a-pytorch-bert-model-and-deploying-it-with-amazon-elastic-inference-on-amazon-sagemaker/)

I was wondering whether there’s any examples or any suggested way to use the beam searching logic in BartForConditionalGeneration with model inference from a torchscript model. Most of the examples for torchscript I’ve found are with classification tasks where this isn’t necessary.

setu4993 · September 15, 2020, 1:14am

I’ve had success with deploying a BartForConditionalGeneration model using SageMaker with EI.

Try:

model = BartForConditionalGeneration.from_pretrained(model_dir, torchscript=True)

laphangho · September 15, 2020, 1:52am

Thanks a lot for that reply @setu4993!

It looks really promising, we’ll give it a try

setu4993 · September 15, 2020, 1:56am

@laphangho Good luck!

To add a little more context: SageMaker wants a ScriptModule, not trace. Trace is not possible with .generate(), but script works fine. And to use script mode, saving the model in a different way (than the default .save_pretrained() method is not required since torchscript=True can simply be provided as an additional argument when creating the model object.

setu4993 · November 17, 2020, 6:54am

Unfortunately, soon after I wrote that, I realized beam search (.generate()) was actually not working with EI (though .forward() does, but using that means losing out on beam search). I faced the same situation whereby the model inference was taking place on CPU only, no EI, so had to eventually switch up to a GPU instance.

That was ~2 months ago now, so don’t know if something has changed since I haven’t looked back.

A couple other things:

As of writing (and this has been the same for a few months now) EI on PyTorch is only supported on PyTorch 1.3.1. So, if using any other version of PyTorch, it will fallback to CPU.
Since PyTorch 1.6 the way in which the models are saved has changed. To make a model backward compatible, it might have to be saved with _use_new_zipfile_serialization=False.

Sorry I couldn’t help more. Please do share if you find other workarounds here. Thanks!

Ankita · April 25, 2021, 12:39pm

Has anyone figured out a way to run inference (same as the .generate() method) for seq-to-seq models on Elastic Inference?
I am trying to run inference for Seq-to-Seq models (like BART, Pegasus) on Elastic Inference with EC2.
So far, I have been able to use the TorchScript example (1) and store the model, but unable to figure out how to run the inference on it.

(1) Export to ONNX

derekzen · November 20, 2021, 1:08pm

How did you script the generate method?
I use torch.jit.script(model.generate) and got and error:
NotSupportedError: Compiled functions can't take variable number of arguments or use keyword-only arguments with defaults: ...

Topic		Replies	Views
Support for exporting generate function to ONNX? 🤗Transformers	7	2308	February 8, 2023
Model with Genrate method to torchscript Models	2	38	March 12, 2025
How to export facebook/mbart-large-50-many-to-many-mmt to TorchScript format? Beginners	8	57	December 17, 2024
Generate() method for models converted to torchscript Models	2	761	August 1, 2023
Torchscript with Encoder-Decoder architecture Intermediate	0	297	October 11, 2021

Language generation with torchscript model?

Related topics