I’m trying to figure out if I can use sentence_transformers/distiluse-base-multilingual-cased-v2 as a real-time inference endpoint on AWS Sagemaker to retrieve embeddings and run a model.
Using this guide from @philschmid on AWS I see how it uses the following code with the Transformers library to load a model stored in an S3 bucket and create and return the embeddings. This code is provided on most model pages to load the model with Transformers, but not on the distiluse-base-multilingual-cased-v2 page.
I’m able to upload it as in the tutorial, but the vectors returned from Sagemaker are len(768) instead of 512 when I load the model with the sentence_transformers library. I know that this model has a custom dense layer that doesn’t allow it to be fine-tuned with Transformers, and I’m assuming that that when I load the model with Transformers it is missing the custom dense layer at the end. I’m wondering if there are any suggestions that would allow me to load the complete model into Sagemaker and return embeddings of len(512)