How to replicate the best model from AutoNLP?


First of all, a big thanks to the Hugging face team for bringing AutoNLP to us. A really amazing tool. I had the chance to try it out today and it looks very promising and interesting. I was trying to find a best model for a binary text classification task and the config of the best model is shown below:

  "_name_or_path": "AutoNLP",
  "_num_labels": 2,
  "architectures": [
  "attention_probs_dropout_prob": 0.1,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "id2label": {
    "0": "negative",
    "1": "non-negative"
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "label2id": {
    "negative": 0,
    "non-negative": 1
  "layer_norm_eps": 1e-12,
  "max_length": 96,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "pad_token_id": 0,
  "padding": "max_length",
  "position_embedding_type": "absolute",
  "problem_type": "single_label_classification",
  "transformers_version": "4.8.0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522

If I want to start from the scratch and replicate this best model, will I get the same results? From the config file, it seems the best model uses a Bert-large model, drop-out of 0.1, gelu activation. Where can I find the optimizer that was used/ learning rate/batch size and other things to replicate the code?

Say, for example, I am using the code from this Collab notebook, how can I replicate the best model from scratch?

Thanks again!!

It would be hard to replicate as there are several things about training that are not open to the end-user: hyperparameters, optimizer, scheduler, etc. (Un)fortunately, this information will remain closed. You can call it the “secret sauce” of AutoNLP. The final model is, however, open to the end-user and you are free to do whatever you want with it.

Thanks Abhishek. Thats a bit sad, but understandable. I’m not sure about the reason behind the names given to these models like runny fox or tubby snail. It makes AutoNLP look a bit less credible if we were to recommend the use of these models in a company’s project. Nonetheless, its an amazing tool.

(P.S: Your videos are super helpful. Thanks for making those available on your channel. If I may suggest, it would be very interesting to see a layer-wise visualization of one of these bert models to really understand what is happening at each stage.)

Thanks for your quick response.

The model names have nothing to do with how they are trained. It’s similar to the names of docker containers. Something similar to: moby/names-generator.go at master · moby/moby · GitHub

Unfortunately, I have to disagree :slight_smile: Unlike many other AutoML tools, AutoNLP provides you with all the trained models in the end. You have the weights, you have the tokenizer. You are free to use these models for many different purposes: using them directly with Hugging Face’s API Inference (thus saving several days and sometimes even months’ worth of engineering time) or for further fine-tuning of the models (still saving several days of work). AutoNLP provides you with the best possible models so that one doesn’t have to dig in themselves. I’m personally not aware of any tools that can train tens or even hundreds of SOTA transformer models which are production-ready :slight_smile: And that’s why many enterprises already use and love AutoNLP :slight_smile: You can read some of the testimonials here: Announcing AutoNLP – Hugging Face.

If you have any further concerns about how AutoNLP can be used in your industry, please feel free to write at autonlp [at] huggingface [dot] co and we can discuss further.

1 Like

Sorry if I’ve offended. It was just a small suggestion.

Thanks for the explanation though. :slight_smile:

Hey! Not at all!!! I was just explaining :smiley:

1 Like

Thanks for your understanding and for the support :slight_smile:

1 Like