Why does PALIGemma use 256 tokens for a 224x224 image

pchoy95 · December 8, 2024, 10:32am

I noticed that PALIGemma 224x224 uses image_seq_length = 256, and in the PALIGemma paper they also quote this. I can’t make sense of this, given that the use a patch size of 16x16, which implies 196 tokens, not 256.

Neither I, Gemini, ChatGPT or Claude could find the explanation in the Siglip or PALIGemma paper, and I’m finding the code tough to navigate. Can anyone explain this to me or point me to the file with the implementation?

Topic		Replies	Views
Phi3 vision number of tokens Models	1	226	June 18, 2024
Layoutlmv2 token classification on documents having tokens larger than 512 Models	8	2321	October 20, 2022
Layoutlmv3 sequence_length vs token_sequnce_length size mismatch Models	2	699	November 19, 2022
Paligemma finetuning Models	2	107	September 20, 2024
Unable to create tensor, you should probably activate padding with 'padding=True' to have batched tensors with the same length. (Paligemma) 🤗Transformers	2	1427	July 3, 2024

Why does PALIGemma use 256 tokens for a 224x224 image

Related topics