Smolvlm in the browser, session already started

The issue

When I try to use SmolVLM in the browser, I get an error that a session is already started which seems to be originating from ONNX. I cannot get past this error, and I’m not even sure if I’m using it all correctly.

Context

I’ve been trying to get a simple app working in the browser using SmolVLM. All the examples for it are in Python, except for this one example in a PR which does not work.

I’m using "@huggingface/transformers": "^3.3.1", and here is the slightly modified code (different images, different prompt, that’s all):

import { AutoModelForVision2Seq, AutoProcessor, load_image } from "@huggingface/transformers";

// Initialize processor and model
const model_id = "HuggingFaceTB/SmolVLM-Instruct";
const processor = await AutoProcessor.from_pretrained(model_id);
console.log("Processor:", processor);// ✅

const model = await AutoModelForVision2Seq.from_pretrained(model_id, {
  dtype: {
    embed_tokens: "fp16", // "fp32", "fp16", "q8"
    vision_encoder: "q4", // "fp32", "fp16", "q8", "q4", "q4f16"
    decoder_model_merged: "q4", // "q8", "q4", "q4f16"
  },
});
console.log("Model:", model);// ✅

// Load images
const image1 = await load_image(
  "https://huggingface.co/spaces/merve/chameleon-7b/resolve/main/bee.jpg",
);
console.log("image1:", image1); // ✅
const image2 = await load_image(
  "https://huggingface.co/spaces/merve/chameleon-7b/resolve/main/bee.jpg",
);
console.log('image2:', image2); // ✅

// Create input messages
const messages = [
  {
    role: "user",
    content: [
      { type: "image" },
      { type: "image" },
      { type: "text", text: "Are these two images different or the same?" },
    ],
  },
];

// Prepare inputs
const text = processor.apply_chat_template(messages, { add_generation_prompt: true });
console.log("text:", text);// ✅

const inputs = await processor(text, [image1, image2], {
  // Set `do_image_splitting: true` to split images into multiple patches.
  // NOTE: This uses more memory, but can provide more accurate results.
  do_image_splitting: false,
});
console.log("inputs:", inputs); // 🚫 Doesn't reach here

// Generate outputs
const generated_ids = await model.generate({
  ...inputs,
  max_new_tokens: 500,
  sequential: true,
});
console.log("generated_ids:", generated_ids);
const generated_texts = processor.batch_decode(
  generated_ids.slice(null, [inputs.input_ids.dims.at(-1), null]),
  { skip_special_tokens: true },
);
console.log("generated_texts:", generated_texts);
console.log(generated_texts[0]);

When the processor is called (await processor(text, [image1, image2])) , I get this error which seems to originate from within ONNX:

ort-wasm-simd-threaded.jsep.mjs:12
Uncaught Error: Session already started
    at Object._OrtRun (ort-wasm-simd-threaded.jsep.mjs:12:165)
    at kr (ort.bundle.min.mjs:2727:34041)
    at fc (ort.bundle.min.mjs:2727:39061)
    at pn.run (ort.bundle.min.mjs:2727:40952)
    at e.run (ort.bundle.min.mjs:6:18005)
    at registry.js:22:1
    at interpolate_4d (tensor.js:966:1)
    at async Promise.all (:5173/index 1)
    at async Function._call (image_processing_idefics3.js:112:1)
    at async Function._call (processing_idefics3.js:83:1)

I cannot find any useful information about how to stop this. The script ran fine once, and now I can’t get it to work again.

I’m new to these local models:

  • how do I stop a session?
  • should I even have to stop the sessions?

more context

This script is just a super simple script in an HTML page:

<!doctype html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <link rel="icon" type="image/svg+xml" href="/vite.svg" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Vite + Svelte + TS</title>
  </head>
  <body>
    <div id="app"></div>
    <script type="module" src="/foo.js"></script>
  </body>
</html>

Using Vite to reload (not even using Svelte like the title suggests).


I’m really excited to use SmolVLM, but how do I get past this error?

1 Like

I don’t know much about JavaScript or transformers.js, but I think the following official spaces are good samples for WebGPU and smolvlm.