Is there any framework which demonstrates each operation of pipeline (inference) for Roberta (or just any transformer, seq2seq) with explanation of size of matrices?
E.g., input: I am a student
- Make one-hot encoding, matrix 256*1000m, matrix A
- Create matrix B=A * (1000100)
n) Transpose Q= q_t
n+1) Transpose V = v_t
n+1) Transpose K = k_t
I need to understand the inference for transformers as it is (and matrices sizes).
I also need that visuzliation an explanation to implement a similar sequence of matrix operations (multiplication, etc.) to measure speed of inference for random input. It will look like I do not have transformers package, but the procedure of inference were implemented manually with different sequential matrix operations