`serving` signature in TensorFlow Serving blogpost

Hi everyone!

I am currently working through @jplu 's blogpost on serving a HuggingFace model with TF-Serving, in which he overwrites the model’s serving method to change the signature of the traced graph input to accept embeddings.

That, in turn, led me to discover that this serving signature is part of all TF models (Models — transformers 4.7.0 documentation)

Can someone explain to me how exactly this serving method is used by the model server? I can’t find it referenced in the rest of the tutorial and I wasn’t succesful in finding my way around the codebase.

Is that redefined signature used at all in the tutorial? I might be mistaken, but it seems to me that the requests (both REST and gRPC) to the TF-server use the output of the tokenizer, not those of an embedding layer.

cc @Rocketknight1

1 Like

I missed the fact that the serving method is explicitly exported as a metagraph in the SavedModel for all TF models (see here)

One thing that’s still not completely clear to me are the redefined inputs. The tutorial changes the serving method to accept token embeddings, yet the requests to the model use the output of the tokenizer, so token ids. What am I not seeing?