Evaluator failed when trying to batch encoding by index

I’m using evaluate-0.2.2
I run simple code as follows

import evaluate
from transformers.models.bartpho.tokenization_bartpho import BartphoTokenizer
bleu = evaluate.load("bleu")
tokenizer = BartphoTokenizer.from_pretrained('vinai/bartpho-syllable')
p = [
    'Màu đen',
    'Màu xanh dương'
]
r = [
    ['Màu đen'],
    ['Màu xanh dương']
]
bleu.compute(predictions=p,
             references=r,
             tokenizer=tokenizer)

It fails when trying to get batch encoding by index.

KeyError                                  Traceback (most recent call last)
/home/dinhanhx/projects/VisualRoBERTa/test_ds.ipynb Cell 4 in <cell line: 1>()
----> <a href='vscode-notebook-cell://ssh-remote%2Bnode-2/home/dinhanhx/projects/VisualRoBERTa/test_ds.ipynb#X10sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0'>1</a> bleu.compute(predictions=p,
      <a href='vscode-notebook-cell://ssh-remote%2Bnode-2/home/dinhanhx/projects/VisualRoBERTa/test_ds.ipynb#X10sdnNjb2RlLXJlbW90ZQ%3D%3D?line=1'>2</a>              references=r,
      <a href='vscode-notebook-cell://ssh-remote%2Bnode-2/home/dinhanhx/projects/VisualRoBERTa/test_ds.ipynb#X10sdnNjb2RlLXJlbW90ZQ%3D%3D?line=2'>3</a>              tokenizer=tokenizer)

File ~/miniconda3/envs/torch-tpu/lib/python3.8/site-packages/evaluate/module.py:444, in EvaluationModule.compute(self, predictions, references, **kwargs)
    442 inputs = {input_name: self.data[input_name] for input_name in self._feature_names()}
    443 with temp_seed(self.seed):
--> 444     output = self._compute(**inputs, **compute_kwargs)
    446 if self.buf_writer is not None:
    447     self.buf_writer = None

File ~/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--bleu/9e0985c1200e367cce45605ce0ecb5ede079894e0f24f54613fca08eeb8aff76/bleu.py:122, in Bleu._compute(self, predictions, references, tokenizer, max_order, smooth)
    120 references = [[tokenizer(r) for r in ref] for ref in references]
    121 predictions = [tokenizer(p) for p in predictions]
--> 122 score = compute_bleu(
    123     reference_corpus=references, translation_corpus=predictions, max_order=max_order, smooth=smooth
    124 )
    125 (bleu, precisions, bp, ratio, translation_length, reference_length) = score
    126 return {
    127     "bleu": bleu,
    128     "precisions": precisions,
   (...)
    132     "reference_length": reference_length,
    133 }

File ~/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--bleu/9e0985c1200e367cce45605ce0ecb5ede079894e0f24f54613fca08eeb8aff76/nmt_bleu.py:75, in compute_bleu(reference_corpus, translation_corpus, max_order, smooth)
     73 merged_ref_ngram_counts = collections.Counter()
     74 for reference in references:
---> 75   merged_ref_ngram_counts |= _get_ngrams(reference, max_order)
     76 translation_ngram_counts = _get_ngrams(translation, max_order)
     77 overlap = translation_ngram_counts & merged_ref_ngram_counts

File ~/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--bleu/9e0985c1200e367cce45605ce0ecb5ede079894e0f24f54613fca08eeb8aff76/nmt_bleu.py:43, in _get_ngrams(segment, max_order)
     41 for order in range(1, max_order + 1):
     42   for i in range(0, len(segment) - order + 1):
---> 43     ngram = tuple(segment[i:i+order])
     44     ngram_counts[ngram] += 1
     45 return ngram_counts

File ~/miniconda3/envs/torch-tpu/lib/python3.8/site-packages/transformers/tokenization_utils_base.py:240, in BatchEncoding.__getitem__(self, item)
    238     return self._encodings[item]
    239 else:
--> 240     raise KeyError(
    241         "Indexing with integers (to access backend Encoding for a given batch index) "
    242         "is not available when using Python based tokenizers"
    243     )

KeyError: 'Indexing with integers (to access backend Encoding for a given batch index) is not available when using Python based tokenizers'

is there any way that I can fix this?

I think the issue is that the Hugging Face tokenizer doesn’t just return a list of tokens a but a more complicated object.

What you probably want to do is the following:

bleu.compute(predictions=p,
             references=r,
             tokenizer=tokenizer.tokenize)

does that work?

1 Like

I’m aware of the fact that it’s returning BatchEncoding. However, the doc does not mention to use .tokenize() as input (bleu.compute()) other than .__call__() of a tokenizer object.

It works now. Thanks for your help

However they DO mention the output of .tokenize() vaguely

It can be replaced by any function that takes a string as input and returns a list of tokens as output.

Yes, exactly. The tokenizer mentioned in BLEU is not a HF specific tokenizer. Maybe we can improve the wording a bit there.

I think, for simplicity, just include another example of using HF tokenizers.