I am trying to reduce the time in production. I am using TensorFlow on Amazon Sagemaker. I am unable to figure out how to reduce the time.
Currently, I am facing two issues:
I am unable to figure out why the memory distribution In GPU is uneven.
I am using the below-mentioned code:
from transformers import T5Tokenizer, TFT5ForConditionalGeneration
import time
# initialize the model architecture and weights
model = TFT5ForConditionalGeneration.from_pretrained("t5-large")
# initialize the model tokenizer
tokenizer = T5Tokenizer.from_pretrained("t5-large")
import tensorflow as tf
start_time = time.time()
#strategy = tf.distribute.MultiWorkerMirroredStrategy()
#strategy = tf.distribute.MirroredStrategy()
#with strategy.scope():
inputs = tokenizer("summarize: " + text, return_tensors="tf").input_ids
outputs = model.generate(
inputs,
max_length=150,
min_length=41,
length_penalty=5,
num_beams=2,
no_repeat_ngram_size=2,
early_stopping=True)
print(tokenizer.decode(outputs[0]))
elapsed_time = time.time() - start_time
print(elapsed_time)