Is there any framework which demonstrates each operation of pipeline (inference) for Roberta (or just any transformer, seq2seq) with explanation of size of matrices?

E.g., input: I am a student

- Make one-hot encoding, matrix 256*1000m, matrix A
- Create matrix B=A * (1000
*100)*k_t

…

n) Transpose Q= q_t

n+1) Transpose V = v_t

n+1) Transpose K = k_t

n+k) q_t

etc.

I need to understand the inference for transformers as it is (and matrices sizes).

I also need that visuzliation an explanation to implement a similar sequence of matrix operations (multiplication, etc.) to measure speed of inference for random input. It will look like I do not have transformers package, but the procedure of inference were implemented manually with different sequential matrix operations