How to reduce time at production in T5Tokenizer?

divyanshusingh · March 4, 2021, 8:11am

I am trying to reduce the time in production. I am using TensorFlow on Amazon Sagemaker. I am unable to figure out how to reduce the time.
Currently, I am facing two issues:

I am unable to figure out why the memory distribution In GPU is uneven.
I am using the below-mentioned code:

    from transformers import T5Tokenizer, TFT5ForConditionalGeneration
    import time
    # initialize the model architecture and weights
    model = TFT5ForConditionalGeneration.from_pretrained("t5-large")
    # initialize the model tokenizer
    tokenizer = T5Tokenizer.from_pretrained("t5-large")

    import tensorflow as tf
    start_time = time.time()

    #strategy = tf.distribute.MultiWorkerMirroredStrategy()
  
    #strategy = tf.distribute.MirroredStrategy()


    #with strategy.scope():
     inputs = tokenizer("summarize: " + text, return_tensors="tf").input_ids

     outputs = model.generate(
               inputs, 
               max_length=150, 
               min_length=41, 
               length_penalty=5, 
               num_beams=2, 
               no_repeat_ngram_size=2, 
               early_stopping=True)

     print(tokenizer.decode(outputs[0]))
     elapsed_time = time.time() - start_time
     print(elapsed_time)

OlivierCR · April 12, 2021, 4:54pm

Hi, thanks for posting on the forum! what do you mean by time at production? training time? If you run on SageMaker Training API, you can use the Profiler to diagnose bottlenecks.

Topic		Replies	Views
How to deploy a T5 model to AWS SageMaker for fast inference? Amazon SageMaker	13	5785	February 28, 2022
Can t5 be used to text-generation? Beginners	7	8806	April 26, 2023
How to avert 'loading checkpoint shards'? 🤗Transformers	4	12540	November 1, 2024
Reducing latency for GPT-J Beginners	9	2443	December 18, 2022
T5 GPU Runtime Degradation 🤗Transformers	0	851	February 3, 2021

How to reduce time at production in T5Tokenizer?

Related topics