I am trying to reduce the time in production. I am using TensorFlow on Amazon Sagemaker. I am unable to figure out how to reduce the time.
Currently, I am facing two issues:
I am unable to figure out why the memory distribution In GPU is uneven.
I am using the below-mentioned code:
from transformers import T5Tokenizer, TFT5ForConditionalGeneration import time # initialize the model architecture and weights model = TFT5ForConditionalGeneration.from_pretrained("t5-large") # initialize the model tokenizer tokenizer = T5Tokenizer.from_pretrained("t5-large") import tensorflow as tf start_time = time.time() #strategy = tf.distribute.MultiWorkerMirroredStrategy() #strategy = tf.distribute.MirroredStrategy() #with strategy.scope(): inputs = tokenizer("summarize: " + text, return_tensors="tf").input_ids outputs = model.generate( inputs, max_length=150, min_length=41, length_penalty=5, num_beams=2, no_repeat_ngram_size=2, early_stopping=True) print(tokenizer.decode(outputs)) elapsed_time = time.time() - start_time print(elapsed_time)