Find the eqivalent for word.index in BERT?

excuse me i need to get like a dictionary contains the word with its index like this

def word_for_id(integer, tokenizer):
    for word, index in tokenizer.word_index.items():
        if index == integer:
            return word
    return None

but with using BERT i couldn’t find the equivalent as i got berttokenizer' object has no attribute 'word_index'

BERT has word-piece tokens, so if you are after the associated IDs for these word-piece tokens, you can find these.

from transformers import AutoModel
checkpoint = 'bert-base-uncased'
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
tokenizer.vocab
1 Like

thanks a lot for replying … excuse me what is the difference between tokenizer.get_vocab() and tokenizer. vocab ? are both work the same?

excuse me do i need to load my sentences to get the vocab or the vocab here for the pre-trained bert ?

From a very quick scan, tokenizer.vocab is an attribute. tokenizer.get_vocab() is a function. They both return a dictionary with the same number of items. I’m not sure why we need both.