Visualize matrix inference for Roberta (Transformer)

Is there any framework which demonstrates each operation of pipeline (inference) for Roberta (or just any transformer, seq2seq) with explanation of size of matrices?

E.g., input: I am a student

  1. Make one-hot encoding, matrix 256*1000m, matrix A
  2. Create matrix B=A * (1000100)

    n) Transpose Q= q_t
    n+1) Transpose V = v_t
    n+1) Transpose K = k_t
    n+k) q_t
    k_t
    etc.

I need to understand the inference for transformers as it is (and matrices sizes).
I also need that visuzliation an explanation to implement a similar sequence of matrix operations (multiplication, etc.) to measure speed of inference for random input. It will look like I do not have transformers package, but the procedure of inference were implemented manually with different sequential matrix operations

Are you aware of bertviz? From the docs:

The neuron view visualizes individual neurons in the query and key vectors and shows how they are used to compute attention.

More detail in their medium post here, and you can see a demo of the neuron view in the third plot in their colab notebook.

The neuron view in notebook looks like this:

However, the one in the blog post seems closer to what you are asking for:

I would also check out the illustrated transformer and the annotated transformer.