Hi everyone. I’m working on a sequence labelling task. During the training f1 scores are abnormally high for validation set. When I try inference it barely gets anythihg right. I thought it may be about how I used the dataset. I don’t get any errors by the way.
I create the dataset with from_list()
function as follows:
train_dataset = Dataset.from_list(train_l)
valid_dataset = Dataset.from_list(valid_l)
test_dataset = Dataset.from_list(test_l)
where each element of train/test/val_l is dictionary in the following form:
{"tokens": [...], "tags":[...]}
After this I simply tokenize each dataset and align tags with the new tokens. When I run
print(tokenized_train.features)
Outputs:
{'tokens': Sequence(feature=Value(dtype='string', id=None), length=-1, id=None),
'tags': Sequence(feature=Value(dtype='int64', id=None), length=-1, id=None),
'input_ids': Sequence(feature=Value(dtype='int32', id=None), length=-1, id=None),
'token_type_ids': Sequence(feature=Value(dtype='int8', id=None), length=-1, id=None),
'attention_mask': Sequence(feature=Value(dtype='int8', id=None), length=-1, id=None),
'labels': Sequence(feature=Value(dtype='int64', id=None), length=-1, id=None)}
Where labels
are the labels including special -100
and aligned with the byte-pair tokenization. Tags are simplly the tags used for tokens without alighment.
I think there’s something wrong in here. Documentation sasy these should be ClassLabel
so that trainer can understand which values are the target values. On the other hand when I check documentation on loading datasets from local machine it doesn’t mention anything about it.
Can you spot anything wrong here? Should I set ClassLabel
or do anything like that?