Issue with Extracting Word Ids from Batch Encoding Object

I’m not sure if I’m doing something wrong, but for some reason when I go to extract the word_ids across my dataset, it only returns the first entry.

tokenizer = BertTokenizerFast.from_pretrained("dslim/bert-base-NER")
dataset = load_dataset("wnut_17")

#returns a list of input_ids of length 1287 each entry is length 512
tokenized_input = tokenizer(dataset["tokens"], padding=max_length, truncation=true, is_split_into_words=true) 

#returns a lit of words ids of 512, when I investigated it's only returning the first 
# entry from the tokenized input
word_ids = tokenized_input.word_ids() 

#if I put it into a list comprehension it works as expected Returning a list of length 1287
#where each element is 512
word_ids=[tokenized_input[i].word_ids for i in range(len(tokenized_input['input_ids']))]

Does anyone have any thoughts to what I might be doing wrong? In the example here, they use the same call as me but it works?

having the same issue, did you find out how to solve it?

Not really… I just ended up writing a list comprehension. It’s hacky but it works

word_ids = [
for i in range(len(tokenized_input[“input_ids”]))