Loss error for bert token classifier

So i am doing my first berttoken classifier. I am using a german polyglot dataset meaning tokenised words and lists of ner labels.
a row is [‘word1’,‘word2’…] [‘ORG’,‘LOC’…]
This is my code
tokenizer = BertTokenizer.from_pretrained('bert-base-german-cased')
encoded_dataset = [tokenizer(item['words'], is_split_into_words=True,return_tensors="pt", padding='max_length', truncation=True, max_length=128) for item in dataset_1]
model = BertForTokenClassification.from_pretrained('bert-base-german-cased', num_labels=1)

for item in encoded_dataset:

    for key in item:

        item[key] = torch.squeeze(item[key])

train_set = encoded_dataset[:500]

test_set = encoded_dataset[500:]

training_args = TrainingArguments(

num_train_epochs=1,

per_device_train_batch_size=4,

per_device_eval_batch_size=4,

output_dir='results',

logging_dir='logs',

no_cuda=False,  # defaults to false anyway, just to be explicit

)


trainer = Trainer(

    model=model,

    tokenizer=tokenizer,

    args=training_args,

    train_dataset=train_set,

)

trainer.train()

And i am getting key error loss

Could you post the error ?

The problem seems to be in the Trainer. How is your data encoded ? can you show the shape, type ans how it looks before passing to the trainer ?

your num_lables = 1 , Are you doing single classification ?

Try putting num_train_epoch to flioat number = 1.0 to see if it works and also check the number of label? if it really 1 label in your training data ?

num of labels was a mistake i changed it to 4 since they are 4 types. I didnt do any further encoding to the data than this code

when you change label does it outputs the same result ?

yes the float number doesnt change it

Could you print dataset_1 to see how it looks ?

I think maybe you should change dataset to Dataset type and then rewrite like this :

tokenized_dataset = dataset_1.map(lambda x: tokenizer(x[‘words’], is_split_into_words=True,return_tensors=“pt”, padding=‘max_length’, truncation=True, max_length=128)

I think you donot need to loop into dataset_1 but rather pass the column words dataset_1[‘words’] directly to the tokenizer or transform to Dataset format. Datasets — datasets 1.16.1 documentation

the tokenized dataset didnt work. I think i need to do some label encoding first for ner but not sure how to go about that