Models for eye gaze data

Hi,

I was wondering if anyone knows any (transformer) models that work well for eye gaze data classification?
My input would be a time series of x/y coordinates (and potentially some pupil dilation values). The output would be a classification label, e.g., “calm”/“confused”.
As I understand, one possible model would be something like Deepmind’s PerceiverIO, however, I only have a few 100 training samples. Thus, I think I would require a much smaller model.

Does anyone have any experience with this? Current research seems to prefer RNNs; however, I do not see any reasons why transformers should not outperform them here as well. I am thankful for any hints or guidance.