Getting Q, K, V matrices of a ViT

Im doing my research on the paper [2207.05273] Cross-Architecture Knowledge Distillation ( so for the teacher transformer network, an input image is passed through its transformer blocks, and after the inference of several transformer blocks, the feature hT ∈ RN×(3hw) is generated. So when using google_vit = ViTForImageClassification.from_pretrained('google/vit-base-patch16-224') for example, how do I get the feature h_t, or rather the query Q, key K and the value V of the Transformer teacher?