Hi everyone I stumbled into this issue integrating Nvidia triton and a NER model I trained.
For practical purpose the triton inference server is receiving the tokenized text and returning the logits.
So on the client I want to use a token classification pipeline, in particular:
1 - I want to use the pipeline.preprocess() to encode the text
2 - I don’t need to use the pipeline.forward() because the triton server is doing the inference and return the logits to the client.
3 - I want to use the pipeline.postprocess() to retrieve the entites
The issue with this configuration is that I must load the full model locally even though I’m not using it.
I would like to load a pipeline composed only of preprocess and postprocess.
Calling:
token_classifier = pipeline(
"token-classification", model=model_checkpoint)
without giving it the model (only the tokenizer config) results in an error.