Issue with Flaubert Tokenizer as word_ids() method is not available for NER Task

pragmatic-coder · July 14, 2022, 11:30am

I am working with Flaubert for Token Classification Task but when I am trying to compensate for difference in an actual number of labels and now a larger number of tokens after tokenization takes place; it’s showing an error that word_ids() method is not available. The method is available as I did dir(tokenized_input) and it is showing in the available list of methods but when I try to use it…

Error: word_ids() is not available when using python-based tokenizer.

For reference; Tokenizer - use of word_ids to map labels to newer tokens.

I am using Flaubert for Named Entity Recognition Task!

@lewtun

v-moayman · August 15, 2022, 11:33am

You can check this issue: DeBERTa V3 Fast Tokenizer · Issue #14712 · huggingface/transformers (github.com). I believe that it is a solution to your issue.

Topic		Replies	Views
Word_to_tokens() and word_ids() ---- microsoft/deberta-v2/v3 🤗Tokenizers	2	495	July 14, 2022
Word_ids not working with deberta_v2 🤗Tokenizers	1	1330	August 12, 2022
Inputs.word_ids() length not matching word label length 🤗Tokenizers	3	543	March 22, 2024
Tokenizer not found 🤗Tokenizers	0	324	August 18, 2020
Cant load deberta tokenizer Beginners	0	687	March 27, 2021

Issue with Flaubert Tokenizer as word_ids() method is not available for NER Task

Related topics