GPU is far slower than CPU for patch embedding

Hello guys!
I am playing around with ViT inference speed. I have tested the cost time of embedding and encoding on CPU and GPU separately.
The results, out of my expectation, are:

time spent(ms) GPU CPU
embedding 331 1
encoding 4 72

details:
GPU model = RTX 3090Ti
CPU model = Intel i9-12900KF
Pretrained model weights = google/vit-base-patch16-224-in21k

I can understand that GPU is faster than CPU for encoding. But why CPU is faster than GPU for embedding since both embedding and encoding are some DL neural networks and do matrix multiplying operations?
Thanks a lot!