Pretrained Models to Heroku Production Environment

Hi! I’m new to hugging face, so excuse this if it’s a dumb question!

I’m trying to expose the T5 small pre_trained model as a REST endpoint. I have both the model code and the python Flask API working in a virtual environment in python on my computer. When I try to host the API on heroku, the code size exceeds the RAM limit. Are there any best practices for storing these models as static assets that can be called by an API? Is there a better way to do this to avoid hitting the RAM limit?

FYI: I used this tutorial for the model code - https://towardsdatascience.com/simple-abstractive-text-summarization-with-pretrained-t5-text-to-text-transfer-transformer-10f6d602c426

Not sure if it can help for your usecase but we are providing an API inference endpoint for T5: https://huggingface.co/t5-small?text=My+name+is+Sarah+and+I+live+in+London. Will soon be usable for other tasks than translation.

Thanks for the quick response clem! This would be perfect. Do you have abstractive summarization API endpoints available at the moment?

Tried this one - it’s returning an error:
https://huggingface.co/remi/bertabs-finetuned-cnndm-extractive-abstractive-summarization?text=Paris+is+the+[MASK]+of+France.

You can try the bart-large-cnn checkpoint, let us know how that goes!
https://huggingface.co/facebook/bart-large-cnn

1 Like

or bart-large-xsum if you want slightly shorter summaries!
By default, bart-large-cnn summaries will be 56-142 tokens and bart-large-xsum summaries will be 10-62 tokens.

hi @vihardesu you can get T5 working on inference API for tasks other than translation using a simple hack. If you want to do summerization then replace the all task_specific_params in config.json with that of summerization tasks (don’t change task names), so it should look something like

"task_specific_params": {
    "summarization": {
       "early_stopping": true,
      "length_penalty": 2.0,
      "max_length": 200,
      "min_length": 30,
      "no_repeat_ngram_size": 3,
      "num_beams": 4,
      "prefix": "summarize: "
    },
    "translation_en_to_de": {
      "early_stopping": true,
      "length_penalty": 2.0,
      "max_length": 200,
      "min_length": 30,
      "no_repeat_ngram_size": 3,
      "num_beams": 4,
      "prefix": "summarize: "
    },
    "translation_en_to_fr": {
      "early_stopping": true,
      "length_penalty": 2.0,
      "max_length": 200,
      "min_length": 30,
      "no_repeat_ngram_size": 3,
      "num_beams": 4,
      "prefix": "summarize: "
    },
    "translation_en_to_ro": {
      "early_stopping": true,
      "length_penalty": 2.0,
      "max_length": 200,
      "min_length": 30,
      "no_repeat_ngram_size": 3,
      "num_beams": 4,
      "prefix": "summarize: "
    }
  },

This should work for now :grin:

1 Like