I want to deploy Hugging Face with ONNX in JavaScript for question and answering

  1. I know the doc talked about ONNX deploying in BERT. But I still want to ask about deploying on Question And Answering on ONNX in javascript with the ONNX(javascript version). If there is another way to deploy the model and make it works on Javascript other than tensorflow.js and ONNX, please tell me.

  2. I am thinking If I am going to deploy it on IOS and Android, will the CPU going to be able to run it because the Javascript version of ONNX only supports CPU as backend(I may be wrong on that).

  3. I did a project on deploying ReNext on ONNX in Javascript on my github.io, and it works really well.
    This time I have another project in javascript, and I want to deploy a question and answering model(Hugging Face) on React Native for both IOS and Android. Is there anything I need to be careful of or any tutorials that I should look at?

More.
I saw this tutorial before: How to Run PyTorch Models in the Browser With ONNX.js - YouTube
This tutorial said if we use ONNX, we need to be careful on ONNX because it may fail in certain cases and we need to test it.
If you have used ONNX or other way to make the question and answering to work on React Native or other stuff, please free feel to share it.
Thank you for answering.

Reference
ONNX doc for hugging face: Exporting transformers models

2 Likes

Do you have any updates on this? Any help would be greatly appreciated.

1 Like

Interested in this too!

1 Like

The first thing I can think of is that you may run into issues with post processing, that is a bit involved: transformers/utils_qa.py at main · huggingface/transformers · GitHub To me intuitively, the onnx model will only give you as outputs start and end logits, and you will need to process them afterwards. Not sure how this is done in React?