I have a TF GPT-2 LMHead model running on TF Serving and I want to do a beam search(multiple tokens output) with the modelsâ output logits.
payload = {âinputsâ: input_padded}
requests.post(âhttp://localhost:8501/v1/models/gpt2-farmacos5:predictâ, data=json.dumps(payload))
That request returns me a tf.tensor of logits for next token. But how do I convert these logits to multiple tokens output like the huggingfaceâs gpt2model.generate() method (but in tf serving) ?
PD I know there is transformers/generation_tf_utils.py at master ¡ huggingface/transformers ¡ GitHub but I need a simpler implementation.
PD2: I know this post is similar to patrickâs post Generation Probabilities: How to compute probabilities of output scores for GPT2 but heâs working on pytorch and Im in tensorflow. Mabye i should migrate to Pytorch ?
Thanks