Tokenizing Float Tensor?

Forgive if this is naive, I am somewhat new to transformers.

I am exploring a paper (see here) that implements an end-to-end sign language translation model. They propose that instead of converting sign language to intermediate glosses (form of written language) to then pass into seq2seq transformer (mbart), they use a ‘dense gloss’ representation. This dense representation is simply the output of a two layer MLP, and is just a sequence of floats (shape (batch_size, seq_length, 1024)).

From my limited understanding, tokenization is a process specifically for text (sequence of ints), to produce an encoded version of the text. So how does do this process for floats? Does one just convert these floats into ints (e.g. sum each seq)?