T5 spans - need to predict EMPTY or not

natgolovach · August 11, 2021, 6:54pm

Hello!

I’m working on the corrupting spans model on aminoacids on https://huggingface.co/Rostlab/prot_t5_xl_uniref50.

Consider this example:

Given the sentence with aminoacids I would like to predict if there would be an empty space or X in the sentence.

Original sequence: ‘A A A A A A X B B B B B C’
Input: ‘A A A A A A <extra_space_id_0> B B B B B <extra_space_id_1> C’
Target: ‘<extra_space_id_0> X <extra_space_id_1> NOTHING <extra_space_id_2>’

In target I should have only two possible outcomes: X or nothing (yes like binary classification task on sentence).

So how should I tokenize this “empty space”? What should be the correct target to feed in the model?
Are there any other better algorithms how to perform this task?

Thank you very much.

Topic		Replies	Views
T5 masking - spans of text tokens or encoded tokens? Beginners	0	824	August 12, 2021
mT5 Question/Answering fine tuning is generating empty sentences during inference 🤗Transformers	2	656	June 2, 2024
T5: Return the 120 Most Likely Infills Beginners	0	273	August 30, 2021
T5: Tips for finetuning on crossword clues (clue => answer) Models	1	629	October 14, 2020
2 tokens for one character in T5 🤗Tokenizers	2	1616	August 10, 2023

T5 spans - need to predict EMPTY or not

Related topics