AWS Lambda + Transformers + Docker = use High RAM for summarization model

I am try to use summarization model creating API using FastAPI. I set max ram which is 3GB and when call this API, it use full RAM then API stop working. Therefore API not working. Why this use much RAM? is there have any code error or any problem?

I install despondency libraries and copy model to Docker image.

Model - facebook/bart-large-cnn · Hugging Face

Code:

import pandas as pd
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, AutoConfig
from transformers import pipeline
import torch

from fastapi import FastAPI
from mangum import Mangum
import uvicorn

model_checkpoint = "./Model" # NLP Model Location
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
model = AutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)

def summeriesf(textdata):
  sumtext = pipeline("summarization", model=model, tokenizer=tokenizer)
  outputsum = sumtext(textdata, max_length=130, min_length=30, do_sample=False)
  return outputsum

app = FastAPI()
handler = Mangum(app)

@app.get("/")
async def index():
    return "Home"

@app.get("/api")
async def student_data(tinput:str,token:str):

    key = "RRshJy4beYdlNbu"

    if(token == key):
      sumoutput = summeriesf(tinput)
      return {"Status":"Done","Summery":sumoutput}
    else:
      return {"Status":"Error"}

if __name__ == "__main__":
    uvicorn.run(app,host="0.0.0.0",port=8000)

Dependency libraries:

urllib3==1.26.14
fastapi[all]
uvicorn
mangum
pandas
numpy
websockets
setuptools
https://download.pytorch.org/whl/cpu/torch-1.13.1%2Bcpu-cp38-cp38-linux_x86_64.whl
transformers
datasets
evaluate
sentencepiece
aws-lambda-powertools
huggingface-hub
tqdm
scikit-learn

Any answers?