How to make my customized pipeline consumable for Transformers.js

For Transformer.js:


Use main_export() with custom_onnx_configs and with_behavior(..., use_past=True) to get the trio. Do not monkey-patch.

Background and context

  • Why a “trio”: seq2seq generation needs a one-off decoder for the first token and a decoder_with_past for subsequent tokens so KV-cache is reused. This is the supported pattern. (Hugging Face Forums)
  • Where to set it: Optimum’s exporter lets you pass custom_onnx_configs to main_export() and choose behaviors per subgraph: "encoder", "decoder", and "decoder with past". You can also disable post-processing so files are kept separate. (Hugging Face)
  • Transformers.js expects this layout. Public web-ready repos ship onnx/{encoder_model.onnx, decoder_model.onnx, decoder_with_past_model.onnx} or a merged decoder. (Hugging Face)

Minimal, correct export (no patches)

# refs:
# - Export guide (custom_onnx_configs + with_behavior + no_post_process):
#   https://huggingface.co/docs/optimum-onnx/onnx/usage_guides/export_a_model
# - main_export reference:
#   https://huggingface.co/docs/optimum-onnx/en/onnx/package_reference/export

from pathlib import Path
from transformers import AutoConfig
from optimum.exporters.onnx import main_export
from optimum.exporters.tasks import TasksManager

model_dir = "./model"                       # your VisionEncoderDecoder checkpoint
out = Path("./model/trio_onnx"); out.mkdir(parents=True, exist_ok=True)

# Build an ONNX config for your model+task
cfg = AutoConfig.from_pretrained(model_dir)
ctor = TasksManager.get_exporter_config_constructor(
    model_type=cfg.model_type, backend="onnx", task="image-to-text"  # vision→text task
)
onnx_cfg = ctor(config=cfg, task="image-to-text")

# Ask explicitly for the three subgraphs
custom_onnx_configs = {
    "encoder_model": onnx_cfg.with_behavior("encoder"),
    "decoder_model": onnx_cfg.with_behavior("decoder", use_past=False),
    "decoder_with_past_model": onnx_cfg.with_behavior("decoder", use_past=True),
}

# Export. Keep trio separate (avoid automatic merge).
main_export(
    model=model_dir,
    task="image-to-text",
    output=str(out),
    custom_onnx_configs=custom_onnx_configs,
    no_post_process=True,
)

Why this works: Optimum documents custom_onnx_configs and with_behavior("decoder", use_past=True) to emit decoder_with_past_model.onnx; no_post_process=True prevents the exporter from merging decoders. (Hugging Face)

Verify and align with Transformers.js

  • Check the output folder contains exactly: encoder_model.onnx, decoder_model.onnx, decoder_with_past_model.onnx. This mirrors working web repos. (Hugging Face)
  • Use that folder structure in your web model repo. Xenova’s captioner card recommends this layout for browser use. (Hugging Face)

Common failure modes and fixes

  • Only two files produced: you didn’t request the with-past behavior. Add the custom_onnx_configs dict as above. (Hugging Face)
  • Decoder files merged: remove the merge by setting no_post_process=True. The doc names this exact flag. (Hugging Face)
  • Unsure which tasks your model supports: query TasksManager.get_supported_tasks_for_model_type(model_type, "onnx") and pick the vision→text task. The export guide shows this workflow. (Hugging Face)
  • Why two decoders at all: first-token vs subsequent tokens. Author of Transformers.js explains the duplication and runtime need. (Hugging Face Forums)

Optional: merged decoder

Some exporters can produce a single decoder_model_merged.onnx that handles both first and subsequent tokens. If you prefer that, omit no_post_process=True. The public ViT-GPT2 repo shows merged and split variants side by side. (Hugging Face)