How to Modify LLaMA 2 Model for Internal Token Generation Timing

royweiss1 · April 16, 2024, 7:33am

Hi all,
I am currently working with the LLaMA 2 model and need to measure the time it takes to generate each token directly within the model’s computation loop. My goal is to include precise timing of the token generation process inside the model’s forward method to better analyze the performance and computational cost at a granular level.

I understand that this involves modifying the internal workings of the model, specifically around the logits computation for each token. However, I am unsure of the best approach to achieve this without disrupting the model’s performance and functionality.

Could anyone provide insights or examples on how to safely integrate timing into the forward pass of the LLaMA model? I am particularly interested in any best practices for modifying the forward method to include time measurements while ensuring the model remains stable and efficient.

Any guidance or suggestions would be greatly appreciated!
Thank you!

Topic		Replies	Views
Token per second calculations Intermediate	2	2515	April 20, 2025
Llama 3 performance is 4 mins. can get it in seconds? Models	2	491	March 24, 2025
Custom evaluation during Llama2 fine tuning Beginners	1	1056	January 17, 2024
Llama 2 10x slower than LLaMA 1 🤗Transformers	1	724	November 7, 2023
How to extend model.generate() to accept additional parameters to be used by the forward of Llama 🤗Transformers	0	93	October 2, 2024

How to Modify LLaMA 2 Model for Internal Token Generation Timing

Related topics