I have many series of trajectories [[x1, y1, z1], [x2, y2, z2], ... [xn, yn, zn]] of objects I’ve been tracking in imaging. Some of the time points [xi, yi, zi] are missing and I’d like to impute these coordinates [x_hat, y_hat, z_hat] - a problem I see very similar to masked language modeling!

Conceptually the transformer makes sense but I am stuck on the most trivial step! Can numerical values be used as input to a transformer like BERT?

I don’t need to “tokenize” my input, and that is part of my confusion

another is my problem has no “vocabulary”, since the outputs are numerical values (normalized between 0 and 1).

Do I have to use a special architecture (ie. BEiT?).

Hi there!
I was about to ask the same question and just read yours.
I have a similar problem. I have a sequence of heights and speed limits. I want to get the most efficient speed for the whole track. I would like to use attention as the solution has a lot of long-term dependencies.
However, right now I do not see a way of introducing a numeric sequence directly to the models.

Also, if useful, It iS true that for your problem you will not need to transform words into vectors and therefore a tokenize function as defined in the course, but I think you will still need to define the special characters (bos, eos…) to let the model know when to start and finish.