I have many series of trajectories
[[x1, y1, z1], [x2, y2, z2], ... [xn, yn, zn]] of objects I’ve been tracking in imaging. Some of the time points
[xi, yi, zi] are missing and I’d like to impute these coordinates
[x_hat, y_hat, z_hat] - a problem I see very similar to masked language modeling!
Conceptually the transformer makes sense but I am stuck on the most trivial step! Can numerical values be used as input to a transformer like BERT?
- I don’t need to “tokenize” my input, and that is part of my confusion
- another is my problem has no “vocabulary”, since the outputs are numerical values (normalized between 0 and 1).
- Do I have to use a special architecture (ie. BEiT?).