Spaces based on Gensim Model

HI there,

I currently have a trained FastText model available which was trained using Gensim. Is it possible to host this model on the Huggingface Model Hub as well as create a corresponding HF Space for users to interactively try out the vector space that is created by it?


1 Like

Hi Simon!

Yes, you can host this model on the Hub and create a Space for it. We actually already host a couple of fasttext models on the Hub, which you can find at Models - Hugging Face.

You can of course create spaces which are significantly more interactive than the widgets.

1 Like

Hi Omar, thanks for the quick response.

I already got started to create a model repo (simonschoe/earningscall2vec · Hugging Face) in the spirit of Hellisotherpeople/debate2vec · Hugging Face, i.e., with the widget that allows to extract NN. However, the widget is not quite working yet in my repo. I presume its due to the model format being .model instead of .bin.

So when using gensim and saving a model to disk (via, it writes four files to disk:

  • mod.model
  • mod.model.syn1neg.npy
  • mod.model.wv.vectors_ngrams.npy
  • mod.model.wv.vectors_vocab.npy

Any idea which one of these must be pushed to the hub in order for the widget to be compatible with the model file?

EDIT: I figured it out and used the gensim save_facebook_model() to save the model in .bin format (and renamed it model.bin). However, the final model is 2.5gb in size. Is there a maximum model size with regard to the model hub and repo storage?

No, 2.5gb sounds right for Fasttext models, they tend to be quite large. And indeed, model.bin is the right approach. Just take into account that since the models are large, inference and loading will usually be slow.

1 Like

Out of curiosity, what’s the size of the original .model files?

The .model file is 2mb in size, however, the three .npy files amount to ~2,5gb as well, yielding roughly the same (if not exact same) model size.

If you say loading and inference is slow: Is there any chance to speed it up somehow? (e.g., by trimming or freezing the model before pushing it to the hub?) I find it counterintuitive that they are bigger in size than some pretrained transformer models…

Let’s try it out directly in the widget!

One last thing is that the model is expected to be located in 404 Client Error: Not Found for url: (so it should be model.bin). The INference API will load the model and cache results, so after the first inference call, it should definitively work much faster.

I finally have it up and running and everything works as intended :heart_eyes: (simonschoe/call2vec · Hugging Face)

One last question: Is it possible to adjust the query so it does return the top 10 NN instead of just the top 5?

Hey there!

We don’t have this at the moment, but we can open a PR at huggingface_hub/ at main · huggingface/huggingface_hub · GitHub if you want to enable top_k as an argument. Here is an issue:

Great, thanks a lot! Let’s see if/when this is implemented. :slight_smile:
Thanks for all the help so far!

FastText-like models are bigger than many transformers, because a typical transformer has vocabulary of about 30’000 tokens, while a typical FastText model remembers embeddings for 2’000’000 hashed n-grams and for 100’000 to 300’000 words.

But fastText models can be compressed (with some loss of accuracy). The native FastText library supports compression of supervised models (for classification), and there is a library that wraps around Gensim, GitHub - avidale/compress-fasttext: Tools for shrinking fastText models (in gensim format), that can compress unsupervised fastText models for feature extraction.