What exact inputs does bleu_metric.compute() require?

stefan-jo · July 9, 2020, 2:51pm

Very basic question. In my project I want switch from Google Translate API to Marian MT models and therefore I want to compare them using BLEU score. I want to use the BLEU metric from the nlp library, but I’m having problems getting it to work correctly.

Here is my code:

from transformers import MarianTokenizer, MarianMTModel
src = 'de'
trg = 'en'
mname = f'Helsinki-NLP/opus-mt-{src}-{trg}'
tokenizer = MarianTokenizer.from_pretrained(mname)
model = MarianMTModel.from_pretrained(mname)
src_texts = ["Ich bin ein kleiner Frosch.", "Tom bat seinen Lehrer um Rat."]
tgt_texts = ["I am a small frog.", "Tom asked his teacher for advice."]
batch = tokenizer.prepare_translation_batch(src_texts=src_texts, tgt_texts=tgt_texts)
import nlp
bleu_metric = nlp.load_metric('bleu')
preds = model(batch.input_ids)
targets = batch.decoder_input_ids
bleu_metric.compute(preds, targets)

And this is the error message I get:

---------------------------------------------------------------------------
ArrowTypeError                            Traceback (most recent call last)
<ipython-input-89-33da8475e3a1> in <module>
----> 1 bleu_metric.compute(preds, targets)

/opt/conda/envs/fastai/lib/python3.7/site-packages/nlp/metric.py in compute(self, predictions, references, timeout, **metrics_kwargs)
    191         """
    192         if predictions is not None:
--> 193             self.add_batch(predictions=predictions, references=references)
    194         self.finalize(timeout=timeout)
    195 

/opt/conda/envs/fastai/lib/python3.7/site-packages/nlp/metric.py in add_batch(self, predictions, references, **kwargs)
    207         if self.writer is None:
    208             self._init_writer()
--> 209         self.writer.write_batch(batch)
    210 
    211     def add(self, prediction=None, reference=None, **kwargs):

/opt/conda/envs/fastai/lib/python3.7/site-packages/nlp/arrow_writer.py in write_batch(self, batch_examples, writer_batch_size)
    155         if self.pa_writer is None:
    156             self._build_writer(pa_table=pa.Table.from_pydict(batch_examples))
--> 157         pa_table: pa.Table = pa.Table.from_pydict(batch_examples, schema=self._schema)
    158         if writer_batch_size is None:
    159             writer_batch_size = self.writer_batch_size

/opt/conda/envs/fastai/lib/python3.7/site-packages/pyarrow/types.pxi in __iter__()

/opt/conda/envs/fastai/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib.asarray()

/opt/conda/envs/fastai/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib.array()

/opt/conda/envs/fastai/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib._sequence_to_array()

/opt/conda/envs/fastai/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowTypeError: Could not convert tensor([[[ 7.0599, -1.6253,  7.5354,  ..., -1.6600, -1.6035,  0.0000],
         [ 6.7303, -2.1037,  7.4463,  ..., -2.0789, -2.0494,  0.0000],
         [ 6.1458, -1.4315,  7.7637,  ..., -1.3350, -1.3450,  0.0000],
         ...,
         [ 5.6321,  0.7129,  9.5273,  ...,  0.7408,  0.7298,  0.0000],
         [ 5.4492, -0.6234,  9.0366,  ..., -0.6214, -0.6522,  0.0000],
         [ 7.1594, -3.3825,  4.1902,  ..., -3.4131, -3.3912,  0.0000]],

        [[ 5.8872, -3.5864,  5.8050,  ..., -3.5273, -3.4877,  0.0000],
         [ 6.4951, -2.9127,  7.8423,  ..., -2.7731, -2.8157,  0.0000],
         [ 6.4846, -2.8267,  8.0983,  ..., -2.6857, -2.7147,  0.0000],
         ...,
         [ 7.0786, -2.7071,  7.8688,  ..., -2.6743, -2.6513,  0.0000],
         [ 5.6782, -1.9020,  7.4212,  ..., -2.0306, -2.0242,  0.0000],
         [ 7.0517, -2.7702,  5.3939,  ..., -2.7920, -2.7438,  0.0000]]],
       grad_fn=<AddBackward0>) with type Tensor: was not a sequence or recognized null for conversion to list type

I’m following the MarianMT docs and the nlp notebook on Colab. I’m not sure in which exact form the targets have to go into bleu_metric.compute(). On the other hand it looks like the error message is triggered by the format of preds. I tried many different ways but cannot get it to work.

Any help is appreciated

lhoestq · July 10, 2020, 12:38pm

Could you try to convert preds to lists instead of torch.Tensor ?

stefan-jo · July 10, 2020, 2:09pm

Thanks for the tip @lhoestq, but converting to lists leads to another error message

---------------------------------------------------------------------------
ArrowTypeError                            Traceback (most recent call last)
<ipython-input-55-67d4df0cb97c> in <module>
----> 1 bleu_metric.compute(preds[0].tolist(), targets.tolist())

/opt/conda/envs/fastai/lib/python3.7/site-packages/nlp/metric.py in compute(self, predictions, references, timeout, **metrics_kwargs)
    191         """
    192         if predictions is not None:
--> 193             self.add_batch(predictions=predictions, references=references)
    194         self.finalize(timeout=timeout)
    195 

/opt/conda/envs/fastai/lib/python3.7/site-packages/nlp/metric.py in add_batch(self, predictions, references, **kwargs)
    207         if self.writer is None:
    208             self._init_writer()
--> 209         self.writer.write_batch(batch)
    210 
    211     def add(self, prediction=None, reference=None, **kwargs):

/opt/conda/envs/fastai/lib/python3.7/site-packages/nlp/arrow_writer.py in write_batch(self, batch_examples, writer_batch_size)
    155         if self.pa_writer is None:
    156             self._build_writer(pa_table=pa.Table.from_pydict(batch_examples))
--> 157         pa_table: pa.Table = pa.Table.from_pydict(batch_examples, schema=self._schema)
    158         if writer_batch_size is None:
    159             writer_batch_size = self.writer_batch_size

/opt/conda/envs/fastai/lib/python3.7/site-packages/pyarrow/types.pxi in __iter__()

/opt/conda/envs/fastai/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib.asarray()

/opt/conda/envs/fastai/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib.array()

/opt/conda/envs/fastai/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib._sequence_to_array()

/opt/conda/envs/fastai/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowTypeError: Expected a string or bytes object, got a 'list' object

lhoestq · July 10, 2020, 2:54pm

Oh actually I think this is because you have to use model.generate according to the documentation.
Maybe @sshleifer can give you more information.

The input of bleu is tokenized text. An example of usage is

import nlp
bleu_metric = nlp.load_metric('bleu')
prediction = ['Hey', 'how', 'are', 'you', '?']  # tokenized input
reference=[['Hey', 'how', 'are', 'you', '?']]  # one reference for this translation but there could be more
bleu_metric.compute([prediction],[reference])

stefan-jo · July 10, 2020, 3:59pm

Great, that works! Thank you for clarifying @lhoestq

stefan-jo · July 10, 2020, 4:53pm

I wrote a small function to calculate the BLEU score directly from predictions and targets. Right now it’s looping through all predictions and then tokens individually, so it’s pretty slow but good enough for my use case. Any suggestions for optimization are welcome.

def compute_bleu(preds, targets):
    for i in range(len(preds)):
        res_preds, res_targets = [],[]
        for p in preds[i]:
            p_dec = tokenizer.decode(p.item())
            if not p_dec in list(tokenizer.special_tokens_map.values()):
                res_preds.append(p_dec)
        for t in targets[i]:
            t_dec = tokenizer.decode(t.item())
            if not t_dec in list(tokenizer.special_tokens_map.values()):
                res_targets.append(t_dec)
        bleu_metric.add(res_preds, [res_targets])
    return bleu_metric.compute()

Topic		Replies	Views
Problems with trainer.compute_metrics 🤗Transformers	1	215	September 15, 2024
Compute the BLEU using pretrained T5-small Models	2	3984	April 13, 2022
GLEU under construction? 🤗Datasets	8	440	September 21, 2021
Marian MT half precision inference 🤗Transformers	0	410	April 29, 2021
Not sure how to compute BLEU through compute_metrics Beginners	5	4359	November 3, 2023

What exact inputs does bleu_metric.compute() require?

Related topics