AWS Lambda + Transformers + Docker = use High RAM for summarization model

Indramal · June 25, 2023, 4:05pm

I am try to use summarization model creating API using FastAPI. I set max ram which is 3GB and when call this API, it use full RAM then API stop working. Therefore API not working. Why this use much RAM? is there have any code error or any problem?

I install despondency libraries and copy model to Docker image.

Model - facebook/bart-large-cnn · Hugging Face

Code:

import pandas as pd
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, AutoConfig
from transformers import pipeline
import torch

from fastapi import FastAPI
from mangum import Mangum
import uvicorn

model_checkpoint = "./Model" # NLP Model Location
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
model = AutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)

def summeriesf(textdata):
  sumtext = pipeline("summarization", model=model, tokenizer=tokenizer)
  outputsum = sumtext(textdata, max_length=130, min_length=30, do_sample=False)
  return outputsum

app = FastAPI()
handler = Mangum(app)

@app.get("/")
async def index():
    return "Home"

@app.get("/api")
async def student_data(tinput:str,token:str):

    key = "RRshJy4beYdlNbu"

    if(token == key):
      sumoutput = summeriesf(tinput)
      return {"Status":"Done","Summery":sumoutput}
    else:
      return {"Status":"Error"}

if __name__ == "__main__":
    uvicorn.run(app,host="0.0.0.0",port=8000)

Dependency libraries:

urllib3==1.26.14
fastapi[all]
uvicorn
mangum
pandas
numpy
websockets
setuptools
https://download.pytorch.org/whl/cpu/torch-1.13.1%2Bcpu-cp38-cp38-linux_x86_64.whl
transformers
datasets
evaluate
sentencepiece
aws-lambda-powertools
huggingface-hub
tqdm
scikit-learn

Indramal · June 26, 2023, 2:39pm

Any answers?

Topic		Replies	Views
Output truncation of summaries models 🤗Transformers	0	441	March 30, 2023
Pretrained Models to Heroku Production Environment Beginners	5	1830	July 10, 2020
Which summarization model of huggingface supports more than 1024 tokens? Which model is more suitable for programming related articles? 🤗Transformers	1	1760	July 31, 2023
How to utilize a summarization model Beginners	4	2401	February 18, 2021
Google/pegasus-xsum for summerization is very slow Beginners	2	208	February 26, 2024

AWS Lambda + Transformers + Docker = use High RAM for summarization model

Related topics