Token classification for a non-textual data

vitvit · March 5, 2023, 9:13am

I’m looking for an implementation of an architecture that performs token classification, but the input is not an integer that represents the vocabulary but a vector of numbers.
Basically, each token in the input is represented by a vector. Each token is already an embeddings vector.
How can this be achieved?

Expected behavior

Input vector of size 768 for each token. A sequence of such tokens of up to 512.
Maybe it is as simple as removing the layer
(word_embeddings): Embedding(50265, 768, padding_idx=1)?
In any case a link to the solution would be most helpful.

Topic		Replies	Views
EncoderDecoderModel for token classification 🤗Transformers	0	195	October 29, 2022
Encoder-only Transformer (BERT-like) for Token Classification outside NLP Models	0	434	February 16, 2023
Special tokens and inference Intermediate	0	333	November 16, 2020
Convert tokens and token-labels to string 🤗Transformers	7	7653	March 12, 2022
Apply BertForTokenClassification on partially labeled input 🤗Transformers	0	262	November 16, 2021

Token classification for a non-textual data

Expected behavior

Related topics