Language generation with torchscript model?

I have fine-tuned a summarization model following the Hugging Face seq2seq guide (starting from sshleifer/distilbart-xsum-12-6).

Our team is interested in using AWS elastic inference for deployment for cost reduction. (e.g. similar to this https://aws.amazon.com/blogs/machine-learning/fine-tuning-a-pytorch-bert-model-and-deploying-it-with-amazon-elastic-inference-on-amazon-sagemaker/)

I was wondering whether there’s any examples or any suggested way to use the beam searching logic in BartForConditionalGeneration with model inference from a torchscript model. Most of the examples for torchscript I’ve found are with classification tasks where this isn’t necessary.

I’ve had success with deploying a BartForConditionalGeneration model using SageMaker with EI.

Try:

model = BartForConditionalGeneration.from_pretrained(model_dir, torchscript=True)
1 Like

Thanks a lot for that reply @setu4993!

It looks really promising, we’ll give it a try

@laphangho Good luck!

To add a little more context: SageMaker wants a ScriptModule, not trace. Trace is not possible with .generate(), but script works fine. And to use script mode, saving the model in a different way (than the default .save_pretrained() method is not required since torchscript=True can simply be provided as an additional argument when creating the model object.

@setu4993 Thank you for the insights! Can you please elaborate on how you have managed to make .generate() use Elastic Inference? Are you simply calling this method after loading the model with torchscript=True? We tried that but EI is not used that way, all inference takes place on the CPU.

Unfortunately, soon after I wrote that, I realized beam search (.generate()) was actually not working with EI (though .forward() does, but using that means losing out on beam search). I faced the same situation whereby the model inference was taking place on CPU only, no EI, so had to eventually switch up to a GPU instance.

That was ~2 months ago now, so don’t know if something has changed since I haven’t looked back.

A couple other things:

  1. As of writing (and this has been the same for a few months now) EI on PyTorch is only supported on PyTorch 1.3.1. So, if using any other version of PyTorch, it will fallback to CPU.
  2. Since PyTorch 1.6 the way in which the models are saved has changed. To make a model backward compatible, it might have to be saved with _use_new_zipfile_serialization=False.

Sorry I couldn’t help more. Please do share if you find other workarounds here. Thanks!

Has anyone figured out a way to run inference (same as the .generate() method) for seq-to-seq models on Elastic Inference?
I am trying to run inference for Seq-to-Seq models (like BART, Pegasus) on Elastic Inference with EC2.
So far, I have been able to use the TorchScript example (1) and store the model, but unable to figure out how to run the inference on it.

(1) Exporting transformers models — transformers 4.5.0.dev0 documentation

How did you script the generate method?
I use torch.jit.script(model.generate) and got and error:
NotSupportedError: Compiled functions can't take variable number of arguments or use keyword-only arguments with defaults: ...