I am currently working through @jplu 's blogpost on serving a HuggingFace model with TF-Serving, in which he overwrites the model’s
serving method to change the signature of the traced graph input to accept embeddings.
That, in turn, led me to discover that this
serving signature is part of all TF models (Models — transformers 4.7.0 documentation)
Can someone explain to me how exactly this
serving method is used by the model server? I can’t find it referenced in the rest of the tutorial and I wasn’t succesful in finding my way around the codebase.
Is that redefined signature used at all in the tutorial? I might be mistaken, but it seems to me that the requests (both REST and gRPC) to the TF-server use the output of the tokenizer, not those of an embedding layer.