How to use Data Collator?

I want to train transformer TF model for NER with my pipeline. I have a problem with alignment of labels. As I understand for this task one uses DataCollatorForTokenClassification. But I can’t figure out how to use it outside of Trainer to get aligned labels.

Just to clearify what do I mean:

tokens: [‘Europe’,‘is’,‘international’]
labels: [‘1’,‘0’.‘0’]
input_ids: [‘545’,‘43’,‘6343’,‘2334’,‘2’]

hey @Constantin you should be able to use the tokenize_and_align_labels function from here: transformers/run_ner_no_trainer.py at bc2571e61c985ec82819cf01ad038342771c94d0 · huggingface/transformers · GitHub

you could also try adapting the pytorch code to tensorflow for the training loop :slightly_smiling_face: