Hi community,
Here is my image-to-text pipeline:
(customized means not a registered one in official Transformers)
A customized Image processor,
A VisionEncoderDecoder, with a customized vision encoder that inherits the PretrainedModel and a MBartDecoder,
A WordLevel tokenizer (yes I haven’t used a MBartTokenizer and I have distilled my own one for specific corpus).
I want to consume this pipeline in Transformers.js, however I notice that all examples given in Transformers.js documentation seem like pulling from a ready made Transformers pipeline with official components and configurations, I just wonder is it possible to turn my customized pipeline consumable for Transformers.js, or to what extent my pipeline could be partially turned to?
My guess is that the I should make my own image preprocessing step and send the image input tensor to the model, in that way, which kind of js libraries you recommend to use? (It won’t be very intensive, just simply resize and normalize things plus a crop-white-margin function which doesn’t exist in Transformers’ image processors).
Also just to be sure, is my VisionEncoderDecoder possible to export to an onnx format to be consumable for Transformers.js?
Of course my model should be possible to run in browser (and that’s the whole point for me to do this), as it has only 20M parameters (way less than the showcase in Transformers.js)
Thanks for your help in advance!