Segformer fine-tuning: error with the metrics

I’m following this tutorial. I’ve simply copy-pasted the code verbatim into a local Jupyter notebook.

When I run trainer.train() I get this error:

Trainer is attempting to log a value of “[nan, 0.22304851875478263, 0.9226045169903048, 0.0, 0.00034364829394438505, 0.0002366915040712859, nan, 0.00014368284700068667, 0.0, 0.0, 0.8146361679778702, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.8902991532345663, 0.0, 0.00042407633933944864, 0.0, 0.0, nan, 0.0, 0.0, 0.0, 0.0, 0.9226799895754838, 0.0016839565374974338, 0.4636751073508497, 0.0, 0.0, 0.00042719251601971937, 0.0]” of type <class ‘list’> for key “eval/per_category_accuracy” as a scalar. This invocation of Tensorboard’s writer.add_scalar() is incorrect so we dropped this attribute.

What could possibly cause it?

@segments-tobias @nielsr


Thanks for your interest in our blog :slight_smile: I’m currently running it and it seems to run fine for me.

However, we made some updates already (we now store metrics in the evaluate library rather than the datasets library). I’ll update our blog accordingly.

Here’s my notebook: Google Colab


I was able to train nvidia/mit-b0 on your segments/sidewalk-semantic demo dataset, and it works well.

The training time is about 6 hours for 50 epochs. This is on an RTX 3090 and a Ryzen 5 3600X with 12 cores. I suspect the bottleneck is train_transforms and val_transforms which are single-threaded. Before I go ahead and try to convert those to multiprocessing, is my assumption right?

Any suggestions for converting feature extraction and augmentation to multiprocessing? If I instantiate a pool within train_transforms() and val_transforms() every time and pass the images and labels to the pool for processing, then collect the results, I feel I may lose some time creating the pool every time the functions are invoked. Is there a better way?

I do not want to pre-convert all images, since I want the model to benefit from random augmentations each epoch.

Please disregard the previous message. I’ve found TrainingArguments(dataloader_num_workers=N) and I set that to the number of CPU cores on my system. Now training is an order of magnitude faster.

It is still very slow when it gets to ***** Running Evaluation *****, and neither CPU nor GPU are fully utilized at that step. There’s one CPU core at 100%, and the GPU is barely doing 20% compute.

Is there any way to speed up evaluation?

I also see this warning very often, which may or may not be related:

.local/lib/python3.10/site-packages/transformers/data/ UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at …/torch/csrc/utils/tensor_new.cpp:201.)
batch[k] = torch.tensor([f[k] for f in features])

Notebook with the current code:

I’ve switched to evaluate using this fork: GitHub - NielsRogge/evaluate: A library for easily evaluating machine learning models and datasets.

Then I did this:

metric = evaluate.load(

And now evaluation is much faster.

After using multiprocessing for both the dataloader and the evaluation, training time for 50 epochs went from 6 hours to 40 minutes.

I think even simple examples such as the code on your blog should default to multiprocessing. It’s just a couple extra arguments, and it makes the whole thing much faster.

And to address the original issue:

I run the notebooks in VSCode, either locally (on my gaming PC with a fast GPU), or remote via the SSH plugin (VSCode is on the laptop, the Jupyter kernel runs on my gaming PC). There is a difference in the way the progress bars are displayed; in the remote sessions the progress bars are shown differently, and I also seem to get more errors.

I do not have an explanation for it, and it could still be a coincidence. But for now I’ve switched to running everything locally and I do not see major issues anymore.


Did you use metric.compute or metric._compute when calculating metrics? I’m seeing a massive speedup when using metric._compute. I’ve reported this to the evaluate team and they’re looking into it.

I’m pretty sure I’ve used metric.compute but I don’t have that code anymore.

Thanks for the tip, I’ve subscribed to the PR.