I was doing some work where I wanted to generate 10000 sentences from the GptNeo Model. I have a GPU of size 40GB and am running the model in the GPU but everytime the code runs out of memory. Is there a limitation to the number of sentences that I can generate. Below is a small snippet of my code.
tokenizer = GPT2Tokenizer.from_pretrained(model)
model = GPTNeoForCausalLM.from_pretrained(model , pad_token_id = tokenizer.eos_token_id)
input_ids = tokenizer.encode(sentence, return_tensors=‘pt’)
gen_tokens = model.generate(