Resources for Sign Language Translation

Hello All. Recently I have been drawn to huggingface for the ease of use and variety of available model architectures. Currently I conduct research in the area of continuous sign language recognition and I have recently began a transformer based project for the RWTH-Phoenix-Weather-2014 T dataset using image features extracted from a pre-trained 2d Resnet.

Unfortunately I cannot seem to find any resources related the usage of image/frame features as input to huggingface models as a vast majority of examples and documentation are based on text related projects.

Would anyone be able to point me in the right direction or if you have previous experience give me some insight because I would really like to use this framework. Thanks in advance!