Is that possible to embed the tokenizer into the model to have it running on GCP using TensorFlow Serving?


Thanks in advance for your help!

At the beginning I鈥檝e created an issue on github with this question:
Question: Is that possible to embed a tokenizer into the model for tensorflow serving? 路 Issue #13843 路 huggingface/transformers 路 GitHub and I鈥檝e got a suggestion to tag @Rocketknight1 who is an expert in TensorFlow for questions like this.

I already using TF-BERT model (uncased version) with tensorflow serving. I found that I need to modify some inputs to get something like that:

    callable = tf.function(
    concrete_function = callable.get_concrete_function([
        tf.TensorSpec([None, self.max_input_length], tf.int32, name="input_ids"),
        tf.TensorSpec([None, self.max_input_length], tf.int32, name="attention_mask")
    ]), signatures=concrete_function)

Also I found the following example (blog/ at master 路 huggingface/blog 路 GitHub), that allows me to change input signature of a model especially for serving:

from transformers import TFBertForSequenceClassification
import tensorflow as tf

# Creation of a subclass in order to define a new serving signature
class MyOwnModel(TFBertForSequenceClassification):
    # Decorate the serving method with the new input_signature
    # an input_signature represents the name, the data type and the shape of an expected input
        "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"),
        "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"),
        "token_type_ids": tf.TensorSpec((None, None), tf.int32, name="token_type_ids"),
    def serving(self, inputs):
        # call the model to process the inputs
        output =
        test_out = self.serving_output(output)
        # return the formated output
        return test_out

# Instantiate the model with the new serving method
model = MyOwnModel.from_pretrained("bert-base-cased")
# save it with saved_model=True in order to have a SavedModel version along with the h5 weights.
model.save_pretrained("/tmp/my_model6", saved_model=True)

In my current workflow I have a need of python because I have to prepare input for a model by using tokenizer. It does mean that for now I need to have a REST service that gets text request and then sends it to the serving instance. After I switched to GCP AI Platform I think that it is reasonable and worth to try to embed the tokenizer inside the model and let GCP AI Platform serving it.

I did some tries and looks like that it more difficult than it looks like.

The goal is to have the model with tokenizer on GCP AI platform and get rid of python REST API service because all other infrastructure is written using Erlang/Rust. I need to supply the text to the model serving instance (not the object with input_ids, attention_mask, etc.) and get logits. Or softmaxed logits.

So could someone please answer is that possible and if it is possible to provide some guidance how to achieve this?

Thanks a lot for your help!



I have been looking for something like this but couldn鈥檛 find it either. Based on the blog you found blog/ at master 路 huggingface/blog 路 GitHub, the author mentioned that it鈥檚 a possible next step improvement so not sure if this is already possible.

Does anybody solve the problem?

I鈥檓 looking for solution for some days and find nothing about embed tokenizer into model so I could serve it in TensorFlow Serving

tensorflow_text.WordpieceTokenizer may be a solution.

See this project, looks like it was created for solving this issue: GitHub - Hugging-Face-Supporter/tftokenizers: Use Huggingface Transformer and Tokenizers as Tensorflow Reusable SavedModels