Hi! I’m new to hugging face, so excuse this if it’s a dumb question!
I’m trying to expose the T5 small pre_trained model as a REST endpoint. I have both the model code and the python Flask API working in a virtual environment in python on my computer. When I try to host the API on heroku, the code size exceeds the RAM limit. Are there any best practices for storing these models as static assets that can be called by an API? Is there a better way to do this to avoid hitting the RAM limit?
or bart-large-xsum if you want slightly shorter summaries!
By default, bart-large-cnn summaries will be 56-142 tokens and bart-large-xsum summaries will be 10-62 tokens.
hi @vihardesu you can get T5 working on inference API for tasks other than translation using a simple hack. If you want to do summerization then replace the all task_specific_params in config.json with that of summerization tasks (don’t change task names), so it should look something like