I wonder how to visualize the encoder output features of encoder-decoder models like BART, T5.
For the base Bart model, if max position = 1024, model dimension=768,
then the feature dimension would be 1024*768 =786k.
I have no experiences using t-sne before, is it still a reasonable choice for features with dimensions at this order?
Any suggestions for some good practice?