Multilingual translation on SageMaker

Hi guys,

I’m trying to do multilingual translation using facebook/m2m100_1.2B · Hugging Face using Sagemaker SDK.

I got it to work for a single language direction (i.e. source_lang=german, target_lang=english) but what I need is to quickly switch between about 10 different source languages but using the same model for inference.

Any tips?

Hey @Simpan,

sounds like a cool use-case!

What you could do is create a custom inference.py [REF] and overwrite the predict_fn to switch between the languages.
You could then add additional parameters in your request body like srcLang or trgtLang which you would extract in the predict_fn

1 Like

Thanks @philschmid !

Could this also be combined with batch transform?

Yes, of course! providing a custom inference.py works with every deployment option including batch transform, async inference and serverless.

1 Like

Cool thanks!

Btw @philschmid how easy is it to control and achieve optimal GPU utilization when using Sagemaker transform? I’m a complete Sagemaker noob :slight_smile:

In my current setup, I run a cluster of 24x t4 GPUs across 3 different ec2 instances when I do my batch translates.

The upside of my current approach is that it’s very easy to monitor and control GPU utilization. The drawback is the lack of scalability, this is what I’m hoping to solve with Sagemaker.

Thanks!

Btw @philschmid how easy is it to control and achieve optimal GPU utilization when using Sagemaker transform? I’m a complete Sagemaker noob :slight_smile:

You should be able to find GPU utilization inside the AWS console in sagemaker → inference → batch transform.
You can control the workload send to the batch transform job by using the BatchStrategy and MaxPayloadInMB parameters. You can read more about here. That we can utilize batching and GPU at max.

The upside of my current approach is that it’s very easy to monitor and control GPU utilization. The drawback is the lack of scalability, this is what I’m hoping to solve with Sagemaker.

Scaling can be adjusted just by the instance count and instance type when creating the job. Or do you need more?

1 Like

Thanks for your reply @philschmid, very helpful!

1 Like

Hi again @philschmid, I’ve just started implementing your solution. I have a question about switching between translation directions when using batch transform.

So predict_fn will swap between languages as you described in your first reply, but to achieve parallelization on the GPU texts in the same translation directions need to be batched together. For example, let’s say I have 2 different source languages in my dataset, French and German, and I want to translate them all to English. Then the French texts need to be separated and batched together with the other French texts and vice-versa. This is so the model can create a minibatch of texts in the same translation direction at inference.

How can I achieve this behavior? Should I write a custom data order in input_fn? Or is this something that needs to be customized in my batch transform configuration?

After some further research, it seems the best option for me might be to create my own container since my preprocessing code is quite involved and has a lot of dependencies.

Do you have a recommendation for a Huggingface + GPU base image that is compatible with Sagemaker @philschmid ? I had a look at these ones deep-learning-containers/Dockerfile.gpu at master · aws/deep-learning-containers · GitHub

Hey @Simpan,

not sure if maintaining your own container makes it easier than having a inference.py + requirements.txt. Since you would need to provide access to ECR and work with it to use your custom container, including security updates, framework updates etc.

If you still think it is easier and better for your use case you can use all of the available container as BASE → 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-training:1.6.0-transformers4.4.2-gpu-py36-cu110-ubuntu18.04

1 Like