How to make my customized pipeline consumable for Transformers.js

Hi community,

Here is my image-to-text pipeline:

(customized means not a registered one in official Transformers)

A customized Image processor,

A VisionEncoderDecoder, with a customized vision encoder that inherits the PretrainedModel and a MBartDecoder,

A WordLevel tokenizer (yes I haven’t used a MBartTokenizer and I have distilled my own one for specific corpus).

I want to consume this pipeline in Transformers.js, however I notice that all examples given in Transformers.js documentation seem like pulling from a ready made Transformers pipeline with official components and configurations, I just wonder is it possible to turn my customized pipeline consumable for Transformers.js, or to what extent my pipeline could be partially turned to?

My guess is that the I should make my own image preprocessing step and send the image input tensor to the model, in that way, which kind of js libraries you recommend to use? (It won’t be very intensive, just simply resize and normalize things plus a crop-white-margin function which doesn’t exist in Transformers’ image processors).

Also just to be sure, is my VisionEncoderDecoder possible to export to an onnx format to be consumable for Transformers.js?

Of course my model should be possible to run in browser (and that’s the whole point for me to do this), as it has only 20M parameters (way less than the showcase in Transformers.js)

Thanks for your help in advance!

1 Like

It seems possible. For Transoformers.js, there’s a dedicated channel on the HF Discord, so asking there would be the most reliable option.

1 Like

Thanks let me check!

1 Like

Hi John,
I try to follow your export script and I made to export 1 onnx file with the following:

register_tasks_manager_onnx = TasksManager.create_register("onnx")
@register_tasks_manager_onnx("my_hgnetv2", *["feature-extraction"])
class HGNetv2OnnxConfig(ViTOnnxConfig):
    @property
    def inputs(self):
        return {"pixel_values": {0: "batch"}} # only dynamical axis is needed to list here
    @property
    def outputs(self):
        return {"last_hidden_state": {0: "batch"}}

def export_onnx():
    path='./model'
    model = VisionEncoderDecoderModel.from_pretrained(path)
    onnx_config_constructor = TasksManager.get_exporter_config_constructor(
        exporter="onnx",
        model=model,
        task="image-to-text",
        library_name="transformers",
        exporter_config_kwargs={"use_past": True},
    )
    onnx_config = onnx_config_constructor(model.config)
    out = Path("./model/onnx")
    out.mkdir(exist_ok=True)

    inputs, outputs = export(model, 
                             onnx_config, 
                             out/"model.onnx", 
                             onnx_config.DEFAULT_ONNX_OPSET,
                             input_shapes={"pixel_values": [1, 3, 384, 384]},
                             )
    print(inputs)
    print(outputs)

However, I don’t know how to export to trio .onnx file with the cli, since within the python script, I can register the customized config, but I don’t know how to register it with cli…

1 Like

Oh I see, it’s here Export a model to ONNX with optimum.exporters.onnx and we need to use main_export instead of export

1 Like

Finally I use the following:

def export_onnx():
    path='./model'
    out = Path("./model/trio_onnx")
    out.mkdir(exist_ok=True)

    main_export(
        path,
        task="image-to-text",
        output=out,
    )

However, this can only export to encoder_model.onnx and decoder_model.onnx, since I have no idea how the use_past=True can be injected with main_export’s argument(The example in the above link doesn’t work out), I monkey-patched the source code to make it export to trio onnx.

1 Like

For Transformer.js:


Use main_export() with custom_onnx_configs and with_behavior(..., use_past=True) to get the trio. Do not monkey-patch.

Background and context

  • Why a “trio”: seq2seq generation needs a one-off decoder for the first token and a decoder_with_past for subsequent tokens so KV-cache is reused. This is the supported pattern. (Hugging Face Forums)
  • Where to set it: Optimum’s exporter lets you pass custom_onnx_configs to main_export() and choose behaviors per subgraph: "encoder", "decoder", and "decoder with past". You can also disable post-processing so files are kept separate. (Hugging Face)
  • Transformers.js expects this layout. Public web-ready repos ship onnx/{encoder_model.onnx, decoder_model.onnx, decoder_with_past_model.onnx} or a merged decoder. (Hugging Face)

Minimal, correct export (no patches)

# refs:
# - Export guide (custom_onnx_configs + with_behavior + no_post_process):
#   https://huggingface.co/docs/optimum-onnx/onnx/usage_guides/export_a_model
# - main_export reference:
#   https://huggingface.co/docs/optimum-onnx/en/onnx/package_reference/export

from pathlib import Path
from transformers import AutoConfig
from optimum.exporters.onnx import main_export
from optimum.exporters.tasks import TasksManager

model_dir = "./model"                       # your VisionEncoderDecoder checkpoint
out = Path("./model/trio_onnx"); out.mkdir(parents=True, exist_ok=True)

# Build an ONNX config for your model+task
cfg = AutoConfig.from_pretrained(model_dir)
ctor = TasksManager.get_exporter_config_constructor(
    model_type=cfg.model_type, backend="onnx", task="image-to-text"  # vision→text task
)
onnx_cfg = ctor(config=cfg, task="image-to-text")

# Ask explicitly for the three subgraphs
custom_onnx_configs = {
    "encoder_model": onnx_cfg.with_behavior("encoder"),
    "decoder_model": onnx_cfg.with_behavior("decoder", use_past=False),
    "decoder_with_past_model": onnx_cfg.with_behavior("decoder", use_past=True),
}

# Export. Keep trio separate (avoid automatic merge).
main_export(
    model=model_dir,
    task="image-to-text",
    output=str(out),
    custom_onnx_configs=custom_onnx_configs,
    no_post_process=True,
)

Why this works: Optimum documents custom_onnx_configs and with_behavior("decoder", use_past=True) to emit decoder_with_past_model.onnx; no_post_process=True prevents the exporter from merging decoders. (Hugging Face)

Verify and align with Transformers.js

  • Check the output folder contains exactly: encoder_model.onnx, decoder_model.onnx, decoder_with_past_model.onnx. This mirrors working web repos. (Hugging Face)
  • Use that folder structure in your web model repo. Xenova’s captioner card recommends this layout for browser use. (Hugging Face)

Common failure modes and fixes

  • Only two files produced: you didn’t request the with-past behavior. Add the custom_onnx_configs dict as above. (Hugging Face)
  • Decoder files merged: remove the merge by setting no_post_process=True. The doc names this exact flag. (Hugging Face)
  • Unsure which tasks your model supports: query TasksManager.get_supported_tasks_for_model_type(model_type, "onnx") and pick the vision→text task. The export guide shows this workflow. (Hugging Face)
  • Why two decoders at all: first-token vs subsequent tokens. Author of Transformers.js explains the duplication and runtime need. (Hugging Face Forums)

Optional: merged decoder

Some exporters can produce a single decoder_model_merged.onnx that handles both first and subsequent tokens. If you prefer that, omit no_post_process=True. The public ViT-GPT2 repo shows merged and split variants side by side. (Hugging Face)

Well, I still cannot make this work, by debugging, I find that the main_export() will take me to optimum.exporters.utils._get_submodels_and_export_configs(), and an error raises here

        # When specifying custom export configs for supported transformers architectures, we do
        # not force to specify a custom export config for each submodel.
        for key, custom_export_config in custom_export_configs.items():
            models_and_export_configs[key] = (models_and_export_configs[key][0], custom_export_config)

where the custom_export_configs is the one we passed in with use_past injected, while the models_and_export_configs, generated here

            # TODO: this succession of if/else strongly suggests a refactor is needed.
            if (
                task.startswith(TasksManager._ENCODER_DECODER_TASKS)
                and model.config.is_encoder_decoder
                and not monolith
            ):
                models_and_export_configs = get_encoder_decoder_models_for_export(model, export_config)

doesn’t contain the key “decoder_with_past”, where the default export_config generated here

           export_config_constructor = TasksManager.get_exporter_config_constructor(
                model=model, exporter=exporter, task=task, library_name=library_name
            )
           export_config = export_config_constructor(
                model.config,
                int_dtype=int_dtype,
                float_dtype=float_dtype,
                preprocessors=preprocessors,
            )

with a default use_past=False, therefore would not generate a config for “decoder_with_past”.
And actually here is what I monkey_patched during the debugging.

I think there is a high dependency between the export config and model config in optimum library, where I although use a customized encoder but still the VisionEncoderDecoder Config as the outermost config, which leads me to the not custom_architecture config processing logic here, which leads to the above error, which may not considered as a normal scenario in design.

    if not custom_architecture:
        if library_name == "diffusers":
            export_config = None
            models_and_export_configs = get_diffusion_models_for_export(
                model, int_dtype=int_dtype, float_dtype=float_dtype, exporter=exporter
            )
        else:
            export_config_constructor = TasksManager.get_exporter_config_constructor(
                model=model, exporter=exporter, task=task, library_name=library_name
            )
            export_config = export_config_constructor(
                model.config,
                int_dtype=int_dtype,
                float_dtype=float_dtype,
                preprocessors=preprocessors,
            )

            export_config.variant = _variant
            all_variants = "\n".join(
                [f"    - {name}: {description}" for name, description in export_config.VARIANTS.items()]
            )
            logger.info(f"Using the export variant {export_config.variant}. Available variants are:\n{all_variants}")

            # TODO: this succession of if/else strongly suggests a refactor is needed.
            if (
                task.startswith(TasksManager._ENCODER_DECODER_TASKS)
                and model.config.is_encoder_decoder
                and not monolith
            ):
                models_and_export_configs = get_encoder_decoder_models_for_export(model, export_config)
            elif task.startswith("text-generation") and not monolith:
                models_and_export_configs = get_decoder_models_for_export(model, export_config)
            elif model.config.model_type == "sam":
                models_and_export_configs = get_sam_models_for_export(model, export_config)
            elif model.config.model_type == "speecht5":
                models_and_export_configs = get_speecht5_models_for_export(model, export_config, model_kwargs)
            elif model.config.model_type == "musicgen":
                models_and_export_configs = get_musicgen_models_for_export(model, export_config)
            else:
                models_and_export_configs = {"model": (model, export_config)}

        # When specifying custom export configs for supported transformers architectures, we do
        # not force to specify a custom export config for each submodel.
        for key, custom_export_config in custom_export_configs.items():
            models_and_export_configs[key] = (models_and_export_configs[key][0], custom_export_config)
1 Like

Alright, actually we don’t need those verbose configs, just change the task from “image-to-text” to “image-to-text-with-past” will solve the issue (no monkey-patch)

def export_onnx():
    path='./model'
    out = Path("./model/trio_onnx")
    out.mkdir(exist_ok=True)
    main_export(
        path,
        task="image-to-text-with-past", # to get trio onnx model, use "-with-past", otherwise use "image-to-text"
        output=out,
    )
1 Like

Great. About _with_past

Hi John,

I’ve finally succeeded in implementing the above things. Thanks for your help!
Yet I still have some other questions and I think I’d better create a new discussion.

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.