Load model from platform other than HF Hub and display a progress bar by `from_pretrained()` in Transformers.js

Hi community,

(This is a continuing yet rather independent discussion of How to make my customized pipeline consumable for Transformers.js)

I want to display a progress bar during the model loading to be friendly to the users, by using the progress_callback argument of the .from_pretrained()method. However, I cannot find many use cases over the web. The only one I find is Useful snippets · Issue #735 · huggingface/transformers.js · GitHub. Moreover I notice that the argument of the callback is of type ProgressInfo, which not only provides the download progress status but more detailed InitiateProgressInfo | DownloadProgressInfo | ProgressStatusInfo | DoneProgressInfo | ReadyProgressInfo. So I wonder if you know a more detailed usage of this callback function?

Besides, I notice that from_pretrained() only supports pretrained_model_name_or_path that is either a HF hub repo id or a local path, I wonder is there a way to load the model from another model hosting platform(e.g. ModelScope) for users who cannot access to the HF hub normally(e.g. from the Chinese mainland)? I guess we can call the browser to download the file to the browser cache, but I’m not sure if from_pretrained() can access to the cache.

1 Like

I hope this works… I’m not very familiar with using Transformers.js, so if it seems difficult to solve, I recommend asking in the Transformers.js channel on the Hugging Face Discord…

1 Like

Let me check. Never mind, I will also check the channel :slight_smile:, it just seems not so active.

1 Like

Based on the api template I find on ModelScope, it should be something like

        remoteHost: 'https://modelscope.cn/api/v1/models/',
        remotePathTemplate: '{model}/repo?Revision=master&FilePath=.'

So when I put model=alephpi98/FormulaNet, transformers.js would resolve the template to
https://modelscope.cn/api/v1/models/alephpi98/FormulaNet/repo?Revision=master&FilePath=./config.json, which is valid but I still encounter NetWorkError, the network page in the control panel is like

It seems there is a CORS Missing Allow Origin issue as suggested in the panel, do you know how to solve it within Transformers.js?

Of course if I use the HF Hub source, it works well.

1 Like

Ok, seems the modelscope API doesn’t support CORS request, which means the request cannot be made in browser, now I need to think about solutions other than 1.

1 Like

Try to load with local model and seems working nicely.

1 Like

Hmm… Due to CORS?


Cause: the browser is blocking cross-origin reads. Firefox can show 200 OK in Network, yet fetch() rejects with TypeError: NetworkError when CORS isn’t satisfied. Transformers.js is fine. The ModelScope URL pattern you built is fine. The response is not CORS-readable, so JS can’t see the body or headers, and your progress bar never advances. (MDN WebDocument)

What’s happening

  • You point Transformers.js at ModelScope. It builds URLs like
    https://modelscope.cn/api/v1/models/<model>/repo?Revision=master&FilePath=<file> and requests config.json, tokenizer.json, ONNX weights, etc. The endpoint serves the files. (Hugging Face)
  • The browser enforces CORS. If the response lacks a matching Access-Control-Allow-Origin (and passes any preflight), fetch() fails even if the HTTP status is 200. DevTools still shows the 200 because the network request succeeded, but the JS caller is denied access. This is expected. (MDN WebDocument)
  • Your progress UI needs byte totals. Transformers.js’ progress_callback gets loaded and total. If the server omits Content-Length or uses chunked transfer without ranges, total can be missing so the percentage stays indeterminate. Range support (Accept-Ranges: bytes) also helps resumability. (Hugging Face)

Minimal working setup

// deps: @huggingface/transformers ^3.x
// docs: env vars https://huggingface.co/docs/transformers.js/en/api/env
import { env, AutoTokenizer, AutoModel } from '@huggingface/transformers';

// 1) Point to ModelScope. No leading "." in FilePath.
// ModelScope API docs aren't centralized, but the live repo endpoint works.
//   e.g. https://modelscope.cn/api/v1/models/<model>/repo?Revision=master&FilePath=tokenizer_config.json
env.remoteHost = 'https://modelscope.cn';
env.remotePathTemplate = 'api/v1/models/{model}/repo?Revision=master&FilePath=';

// 2) Progress bar. ProgressInfo has loaded/total in bytes.
// docs: https://huggingface.co/docs/transformers.js/api/utils/core
const onProgress = (p) => {
  if (p?.status === 'progress') {
    const pct = p.total ? Math.round((p.loaded / p.total) * 100) : 0;
    document.querySelector('#bar').style.width = pct + '%';
    document.querySelector('#txt').textContent = p.file + ' ' + pct + '%';
  }
};

const MODEL_ID = 'alephpi98/FormulaNet';
const tok = await AutoTokenizer.from_pretrained(MODEL_ID, { progress_callback: onProgress });
const mdl = await AutoModel.from_pretrained(MODEL_ID, { progress_callback: onProgress });

Relevant API: env.remoteHost, env.remotePathTemplate, and progress_callback. (Hugging Face)

Server headers that make this reliable

Set these on the host serving files (ModelScope or a proxy you control):

  • Access-Control-Allow-Origin: <your site origin> or * for public. (MDN WebDocument)
  • Access-Control-Expose-Headers: Content-Length, Accept-Ranges, ETag so JS can read them. (MDN WebDocument)
  • Accept-Ranges: bytes for partial requests and accurate progress. (MDN WebDocument)
  • Content-Length: <bytes> on each file to populate total. (MDN WebDocument)
    Do not use mode: "no-cors"; it creates an opaque response JS can’t read. (MDN WebDocument)

Quick verification

Run this in your page console:

// test one file directly
const u = 'https://modelscope.cn/api/v1/models/alephpi98/FormulaNet/repo?Revision=master&FilePath=config.json';
const r = await fetch(u); // default CORS mode
console.log(r.ok, r.status);
console.log('ACAO=', r.headers.get('access-control-allow-origin'));
console.log('LEN=', r.headers.get('content-length'), 'RANGES=', r.headers.get('accept-ranges'));

If ACAO is null or mismatched, Firefox will still show 200 in Network but JS will fail per CORS rules. (MDN WebDocument)

If you can’t change ModelScope headers

Front it with a tiny proxy that adds CORS and preserves lengths.

Cloudflare Worker

// refs: CORS guide https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/CORS
export default {
  async fetch(req) {
    const url = new URL(req.url);
    // /ms/<model>/<...filePath...>
    const [, , model, ...rest] = url.pathname.split('/');
    const fp = rest.join('/');
    const target = `https://modelscope.cn/api/v1/models/${model}/repo?Revision=master&FilePath=${encodeURIComponent(fp)}`;

    const resp = await fetch(target, { headers: { accept: req.headers.get('accept') || '*/*' } });
    const h = new Headers(resp.headers);
    h.set('access-control-allow-origin', '*');
    h.set('access-control-expose-headers', 'Content-Length, Accept-Ranges, ETag');
    return new Response(resp.body, { status: resp.status, headers: h });
  }
}

Then:

env.remoteHost = 'https://your-cdn.example.com';
env.remotePathTemplate = 'ms/{model}/';

Nginx

location /ms/ {
  proxy_pass https://modelscope.cn/api/v1/models/;
  add_header Access-Control-Allow-Origin * always;
  add_header Access-Control-Expose-Headers "Content-Length, Accept-Ranges, ETag" always;
  gzip off;  # preserve Content-Length
}

Background: exposing Content-Length lets the progress callback compute percentages. Range support improves UX on large files. (MDN WebDocument)

Common gotchas

  • Leading . in your template produces FilePath=./.... Drop it. The endpoint accepts FilePath=config.json just fine. Example responses return with direct FilePath=tokenizer_config.json. (ModelScope)
  • Cached HF assets can mask tests. Clear the browser Cache API used by Transformers.js or test in a fresh profile. The project’s “useful snippets” issue shows cache-clearing and a web progress handler. (GitHub)
  • Chunked responses without Content-Length make total unknown, so percent bars stall at 0. This is a generic HTTP behavior. (MDN WebDocument)

Context recap

  • Transformers.js exposes env to retarget downloads away from HF Hub. Use remoteHost and remotePathTemplate. (Hugging Face)
  • Progress signals are typed (status, file, loaded, total). Drive your UI from that. (Hugging Face)
  • Your ModelScope URL form is valid. The failing piece is CORS visibility, not the HTTP status. (ModelScope)

Short, curated references

Transformers.js

  • env variables, including remoteHost/remotePathTemplate. Clear and current. (Hugging Face)
  • Progress callback and fields (loaded, total). (Hugging Face)
  • Useful snippets: cache clearing and browser progress handlers. (GitHub)

ModelScope

  • Live repo API example showing FilePath=<name> working. Useful for sanity checks. (ModelScope)

CORS and headers

  • CORS error semantics in Firefox and why 200 can still fail JS. (MDN WebDocument)
  • Access-Control-Expose-Headers and why you need it for Content-Length. (MDN WebDocument)
  • Accept-Ranges and partial requests. (MDN WebDocument)
  • Content-Length background and caveats with chunked transfer. (MDN WebDocument)
1 Like

adding the leading dot is simply because I find that transformers.js adds an additional ‘/’ when concatenating the filename (i.e. config.json) after the template during resolution. So it becomes …FilePath=/config.json, which returns 404.

1 Like

By the way, I posted this message yesterday, but it ended up in reverse order with your post…
Anyway, just read it as if it were originally the second comment.:sweat_smile:

1 Like