What exact inputs does bleu_metric.compute() require?

Very basic question. In my project I want switch from Google Translate API to Marian MT models and therefore I want to compare them using BLEU score. I want to use the BLEU metric from the nlp library, but I’m having problems getting it to work correctly.

Here is my code:

from transformers import MarianTokenizer, MarianMTModel
src = 'de'
trg = 'en'
mname = f'Helsinki-NLP/opus-mt-{src}-{trg}'
tokenizer = MarianTokenizer.from_pretrained(mname)
model = MarianMTModel.from_pretrained(mname)
src_texts = ["Ich bin ein kleiner Frosch.", "Tom bat seinen Lehrer um Rat."]
tgt_texts = ["I am a small frog.", "Tom asked his teacher for advice."]
batch = tokenizer.prepare_translation_batch(src_texts=src_texts, tgt_texts=tgt_texts)
import nlp
bleu_metric = nlp.load_metric('bleu')
preds = model(batch.input_ids)
targets = batch.decoder_input_ids
bleu_metric.compute(preds, targets)

And this is the error message I get:

---------------------------------------------------------------------------
ArrowTypeError                            Traceback (most recent call last)
<ipython-input-89-33da8475e3a1> in <module>
----> 1 bleu_metric.compute(preds, targets)

/opt/conda/envs/fastai/lib/python3.7/site-packages/nlp/metric.py in compute(self, predictions, references, timeout, **metrics_kwargs)
    191         """
    192         if predictions is not None:
--> 193             self.add_batch(predictions=predictions, references=references)
    194         self.finalize(timeout=timeout)
    195 

/opt/conda/envs/fastai/lib/python3.7/site-packages/nlp/metric.py in add_batch(self, predictions, references, **kwargs)
    207         if self.writer is None:
    208             self._init_writer()
--> 209         self.writer.write_batch(batch)
    210 
    211     def add(self, prediction=None, reference=None, **kwargs):

/opt/conda/envs/fastai/lib/python3.7/site-packages/nlp/arrow_writer.py in write_batch(self, batch_examples, writer_batch_size)
    155         if self.pa_writer is None:
    156             self._build_writer(pa_table=pa.Table.from_pydict(batch_examples))
--> 157         pa_table: pa.Table = pa.Table.from_pydict(batch_examples, schema=self._schema)
    158         if writer_batch_size is None:
    159             writer_batch_size = self.writer_batch_size

/opt/conda/envs/fastai/lib/python3.7/site-packages/pyarrow/types.pxi in __iter__()

/opt/conda/envs/fastai/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib.asarray()

/opt/conda/envs/fastai/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib.array()

/opt/conda/envs/fastai/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib._sequence_to_array()

/opt/conda/envs/fastai/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowTypeError: Could not convert tensor([[[ 7.0599, -1.6253,  7.5354,  ..., -1.6600, -1.6035,  0.0000],
         [ 6.7303, -2.1037,  7.4463,  ..., -2.0789, -2.0494,  0.0000],
         [ 6.1458, -1.4315,  7.7637,  ..., -1.3350, -1.3450,  0.0000],
         ...,
         [ 5.6321,  0.7129,  9.5273,  ...,  0.7408,  0.7298,  0.0000],
         [ 5.4492, -0.6234,  9.0366,  ..., -0.6214, -0.6522,  0.0000],
         [ 7.1594, -3.3825,  4.1902,  ..., -3.4131, -3.3912,  0.0000]],

        [[ 5.8872, -3.5864,  5.8050,  ..., -3.5273, -3.4877,  0.0000],
         [ 6.4951, -2.9127,  7.8423,  ..., -2.7731, -2.8157,  0.0000],
         [ 6.4846, -2.8267,  8.0983,  ..., -2.6857, -2.7147,  0.0000],
         ...,
         [ 7.0786, -2.7071,  7.8688,  ..., -2.6743, -2.6513,  0.0000],
         [ 5.6782, -1.9020,  7.4212,  ..., -2.0306, -2.0242,  0.0000],
         [ 7.0517, -2.7702,  5.3939,  ..., -2.7920, -2.7438,  0.0000]]],
       grad_fn=<AddBackward0>) with type Tensor: was not a sequence or recognized null for conversion to list type

I’m following the MarianMT docs and the nlp notebook on Colab. I’m not sure in which exact form the targets have to go into bleu_metric.compute(). On the other hand it looks like the error message is triggered by the format of preds. I tried many different ways but cannot get it to work.

Any help is appreciated :slight_smile:

Could you try to convert preds to lists instead of torch.Tensor ?

Thanks for the tip @lhoestq, but converting to lists leads to another error message

---------------------------------------------------------------------------
ArrowTypeError                            Traceback (most recent call last)
<ipython-input-55-67d4df0cb97c> in <module>
----> 1 bleu_metric.compute(preds[0].tolist(), targets.tolist())

/opt/conda/envs/fastai/lib/python3.7/site-packages/nlp/metric.py in compute(self, predictions, references, timeout, **metrics_kwargs)
    191         """
    192         if predictions is not None:
--> 193             self.add_batch(predictions=predictions, references=references)
    194         self.finalize(timeout=timeout)
    195 

/opt/conda/envs/fastai/lib/python3.7/site-packages/nlp/metric.py in add_batch(self, predictions, references, **kwargs)
    207         if self.writer is None:
    208             self._init_writer()
--> 209         self.writer.write_batch(batch)
    210 
    211     def add(self, prediction=None, reference=None, **kwargs):

/opt/conda/envs/fastai/lib/python3.7/site-packages/nlp/arrow_writer.py in write_batch(self, batch_examples, writer_batch_size)
    155         if self.pa_writer is None:
    156             self._build_writer(pa_table=pa.Table.from_pydict(batch_examples))
--> 157         pa_table: pa.Table = pa.Table.from_pydict(batch_examples, schema=self._schema)
    158         if writer_batch_size is None:
    159             writer_batch_size = self.writer_batch_size

/opt/conda/envs/fastai/lib/python3.7/site-packages/pyarrow/types.pxi in __iter__()

/opt/conda/envs/fastai/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib.asarray()

/opt/conda/envs/fastai/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib.array()

/opt/conda/envs/fastai/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib._sequence_to_array()

/opt/conda/envs/fastai/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowTypeError: Expected a string or bytes object, got a 'list' object
1 Like

Oh actually I think this is because you have to use model.generate according to the documentation.
Maybe @sshleifer can give you more information.

The input of bleu is tokenized text. An example of usage is

import nlp
bleu_metric = nlp.load_metric('bleu')
prediction = ['Hey', 'how', 'are', 'you', '?']  # tokenized input
reference=[['Hey', 'how', 'are', 'you', '?']]  # one reference for this translation but there could be more
bleu_metric.compute([prediction],[reference])
1 Like

Great, that works! Thank you for clarifying @lhoestq :slight_smile:

I wrote a small function to calculate the BLEU score directly from predictions and targets. Right now it’s looping through all predictions and then tokens individually, so it’s pretty slow but good enough for my use case. Any suggestions for optimization are welcome.

def compute_bleu(preds, targets):
    for i in range(len(preds)):
        res_preds, res_targets = [],[]
        for p in preds[i]:
            p_dec = tokenizer.decode(p.item())
            if not p_dec in list(tokenizer.special_tokens_map.values()):
                res_preds.append(p_dec)
        for t in targets[i]:
            t_dec = tokenizer.decode(t.item())
            if not t_dec in list(tokenizer.special_tokens_map.values()):
                res_targets.append(t_dec)
        bleu_metric.add(res_preds, [res_targets])
    return bleu_metric.compute()
1 Like