I am trying to do multilabel classification on a corpus of data which has labels too. When
The data looks like this after adding the tag in the front for each row:
0 multilabel classification: how time changes th…
1 multilabel classification: hawaii has been in …
2 multilabel classification: not all alaskans ar…
3 multilabel classification: you should read rap…
4 multilabel classification: giving stupid kids …
I am trying to tokenize above and since there are multiple rows, I am guessing i have to go in a loop. What I am trying understand is, how do I get the input_ids, attention_mask? should I go in a loop to get for each row or for the entire text?
am I doing it right above by adding the tag multilabel classification: for each row? am totally confused whether my assumption is wrong or whether this is the way to do it.
my code is:
src_tokenized = TOKENIZER.encode_plus(
src_input_ids = src_tokenized[‘input_ids’]
src_attention_mask = src_tokenized[‘attention_mask’]
t5_summary_ids = t5_model.generate(src_input_ids)
am feeling am doing wrong by running it row by row. but am not sure. I googled for it and all i see multilabel classification example is using pytorch not tf.
Appreciate all the help. TIA