The issue
When I try to use SmolVLM in the browser, I get an error that a session is already started which seems to be originating from ONNX. I cannot get past this error, and I’m not even sure if I’m using it all correctly.
Context
I’ve been trying to get a simple app working in the browser using SmolVLM. All the examples for it are in Python, except for this one example in a PR which does not work.
I’m using "@huggingface/transformers": "^3.3.1"
, and here is the slightly modified code (different images, different prompt, that’s all):
import { AutoModelForVision2Seq, AutoProcessor, load_image } from "@huggingface/transformers";
// Initialize processor and model
const model_id = "HuggingFaceTB/SmolVLM-Instruct";
const processor = await AutoProcessor.from_pretrained(model_id);
console.log("Processor:", processor);// ✅
const model = await AutoModelForVision2Seq.from_pretrained(model_id, {
dtype: {
embed_tokens: "fp16", // "fp32", "fp16", "q8"
vision_encoder: "q4", // "fp32", "fp16", "q8", "q4", "q4f16"
decoder_model_merged: "q4", // "q8", "q4", "q4f16"
},
});
console.log("Model:", model);// ✅
// Load images
const image1 = await load_image(
"https://huggingface.co/spaces/merve/chameleon-7b/resolve/main/bee.jpg",
);
console.log("image1:", image1); // ✅
const image2 = await load_image(
"https://huggingface.co/spaces/merve/chameleon-7b/resolve/main/bee.jpg",
);
console.log('image2:', image2); // ✅
// Create input messages
const messages = [
{
role: "user",
content: [
{ type: "image" },
{ type: "image" },
{ type: "text", text: "Are these two images different or the same?" },
],
},
];
// Prepare inputs
const text = processor.apply_chat_template(messages, { add_generation_prompt: true });
console.log("text:", text);// ✅
const inputs = await processor(text, [image1, image2], {
// Set `do_image_splitting: true` to split images into multiple patches.
// NOTE: This uses more memory, but can provide more accurate results.
do_image_splitting: false,
});
console.log("inputs:", inputs); // 🚫 Doesn't reach here
// Generate outputs
const generated_ids = await model.generate({
...inputs,
max_new_tokens: 500,
sequential: true,
});
console.log("generated_ids:", generated_ids);
const generated_texts = processor.batch_decode(
generated_ids.slice(null, [inputs.input_ids.dims.at(-1), null]),
{ skip_special_tokens: true },
);
console.log("generated_texts:", generated_texts);
console.log(generated_texts[0]);
When the processor is called (await processor(text, [image1, image2])
) , I get this error which seems to originate from within ONNX:
ort-wasm-simd-threaded.jsep.mjs:12
Uncaught Error: Session already started
at Object._OrtRun (ort-wasm-simd-threaded.jsep.mjs:12:165)
at kr (ort.bundle.min.mjs:2727:34041)
at fc (ort.bundle.min.mjs:2727:39061)
at pn.run (ort.bundle.min.mjs:2727:40952)
at e.run (ort.bundle.min.mjs:6:18005)
at registry.js:22:1
at interpolate_4d (tensor.js:966:1)
at async Promise.all (:5173/index 1)
at async Function._call (image_processing_idefics3.js:112:1)
at async Function._call (processing_idefics3.js:83:1)
I cannot find any useful information about how to stop this. The script ran fine once, and now I can’t get it to work again.
I’m new to these local models:
- how do I stop a session?
- should I even have to stop the sessions?
more context
This script is just a super simple script in an HTML page:
<!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<link rel="icon" type="image/svg+xml" href="/vite.svg" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Vite + Svelte + TS</title>
</head>
<body>
<div id="app"></div>
<script type="module" src="/foo.js"></script>
</body>
</html>
Using Vite to reload (not even using Svelte like the title suggests).
I’m really excited to use SmolVLM, but how do I get past this error?