HF auto train dataset load_metric error

Hello,
I am trying to fine-tune the google-bert/bert-base-uncased model with the lhoestq/squad dataset as shown in the documentation for an example of Extraction Question and Answering which is shown in the link below:

I have set up my autotrain UI exactly as shown in the screen shot in the above link. I have checked that I have the correct column names. I have tried running this with a CPU and the small T4. However, in both cases I get the following errors:

Device 0: Tesla T4 - 2.88MiB/15360MiB


INFO | 2025-05-31 16:58:27 | autotrain.app.utils:kill_process_by_pid:90 - Sent SIGTERM to process with PID 65

INFO | 2025-05-31 16:58:27 | autotrain.app.utils:get_running_jobs:40 - Killing PID: 65

subprocess.CalledProcessError: Command ‘[’/app/env/bin/python’, ‘-m’, ‘autotrain.trainers.extractive_question_answering’, ‘–training_config’, ‘autotrain-vfbpf-ju79s/training_params.json’]’ returned non-zero exit status 1.

raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)

File “/app/env/lib/python3.10/site-packages/accelerate/commands/launch.py”, line 763, in simple_launcher

simple_launcher(args)

File “/app/env/lib/python3.10/site-packages/accelerate/commands/launch.py”, line 1168, in launch_command

args.func(args)

File “/app/env/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py”, line 48, in main

sys.exit(main())

File “/app/env/bin/accelerate”, line 8, in

Traceback (most recent call last):

ImportError: cannot import name ‘load_metric’ from ‘datasets’ (/app/env/lib/python3.10/site-packages/datasets/init.py)

from datasets import load_metric

File “/app/env/lib/python3.10/site-packages/autotrain/trainers/extractive_question_answering/utils.py”, line 6, in

from autotrain.trainers.extractive_question_answering import utils

File “/app/env/lib/python3.10/site-packages/autotrain/trainers/extractive_question_answering/main.py”, line 30, in

exec(code, run_globals)

File “/app/env/lib/python3.10/runpy.py”, line 86, in _run_code

return _run_code(code, main_globals, None,

File “/app/env/lib/python3.10/runpy.py”, line 196, in _run_module_as_main

Traceback (most recent call last):

To avoid this warning pass in values for each of the problematic parameters or run accelerate config.

--dynamo_backend was set to a value of 'no'

The following values were not passed to accelerate launch and had defaults used instead:

INFO | 2025-05-31 16:58:15 | autotrain.backends.local:create:25 - Training PID: 65

INFO | 2025-05-31 16:58:15 | autotrain.commands:launch_command:515 - {‘data_path’: ‘lhoestq/squad’, ‘model’: ‘google-bert/bert-base-uncased’, ‘lr’: 5e-05, ‘epochs’: 3, ‘max_seq_length’: 512, ‘max_doc_stride’: 128, ‘batch_size’: 8, ‘warmup_ratio’: 0.1, ‘gradient_accumulation’: 1, ‘optimizer’: ‘adamw_torch’, ‘scheduler’: ‘linear’, ‘weight_decay’: 0.0, ‘max_grad_norm’: 1.0, ‘seed’: 42, ‘train_split’: ‘train’, ‘valid_split’: ‘validation’, ‘text_column’: ‘context’, ‘question_column’: ‘question’, ‘answer_column’: ‘answers’, ‘logging_steps’: -1, ‘project_name’: ‘autotrain-vfbpf-ju79s’, ‘auto_find_batch_size’: False, ‘mixed_precision’: ‘fp16’, ‘save_total_limit’: 1, ‘token’: ‘*****’, ‘push_to_hub’: True, ‘eval_strategy’: ‘epoch’, ‘username’: ‘ianmd’, ‘log’: ‘tensorboard’, ‘early_stopping_patience’: 5, ‘early_stopping_threshold’: 0.01}

INFO | 2025-05-31 16:58:15 | autotrain.commands:launch_command:514 - [‘accelerate’, ‘launch’, ‘–num_machines’, ‘1’, ‘–num_processes’, ‘1’, ‘–mixed_precision’, ‘fp16’, ‘-m’, ‘autotrain.trainers.extractive_question_answering’, ‘–training_config’, ‘autotrain-vfbpf-ju79s/training_params.json’]

INFO | 2025-05-31 16:58:15 | autotrain.backends.local:create:20 - Starting local training…

INFO | 2025-05-31 16:58:15 | autotrain.app.ui_routes:handle_form:540 - hardware: local-ui

INFO | 2025-05-31 16:56:38 | autotrain.app.ui_routes:fetch_params:415 - Task: extractive-qa

INFO | 2025-05-31 16:56:27 | autotrain.app.ui_routes:fetch_params:415 - Task: llm:sft

INFO: 10.16.19.229:12486 - “GET /?__sign=eyJhbGciOiJFZERTQSJ9.eyJyZWFkIjp0cnVlLCJwZXJtaXNzaW9ucyI6eyJyZXBvLmNvbnRlbnQucmVhZCI6dHJ1ZX0sIm9uQmVoYWxmT2YiOnsia2luZCI6InVzZXIiLCJfaWQiOiI2N2VlNTdmZDM1NDdmODIzMTAyNTI5M2MiLCJ1c2VyIjoiaWFubWQiLCJzZXNzaW9uSWQiOiI2ODNhY2NkMjFhYjk5N2VlMjZkZThjZjkifSwiaWF0IjoxNzQ4NzEwNTg2LCJzdWIiOiIvc3BhY2VzL2lhbm1kL2F1dG90cmFpbi10ZXN0aW5nIiwiZXhwIjoxNzQ4Nzk2OTg2LCJpc3MiOiJodHRwczovL2h1Z2dpbmdmYWNlLmNvIn0.sExea1b6OSWrCBCUfS_3I9DmYqaIclQC9dNG4pukT00UNEB_2x8uq3bt-Culu03y-zIoAfhT94RQR_IAEfwxCw HTTP/1.1” 307 Temporary Redirect

INFO: 10.16.19.229:12486 - “GET /?__sign=eyJhbGciOiJFZERTQSJ9.eyJyZWFkIjp0cnVlLCJwZXJtaXNzaW9ucyI6eyJyZXBvLmNvbnRlbnQucmVhZCI6dHJ1ZX0sIm9uQmVoYWxmT2YiOnsia2luZCI6InVzZXIiLCJfaWQiOiI2N2VlNTdmZDM1NDdmODIzMTAyNTI5M2MiLCJ1c2VyIjoiaWFubWQiLCJzZXNzaW9uSWQiOiI2ODNhY2NkMjFhYjk5N2VlMjZkZThjZjkifSwiaWF0IjoxNzQ4NzEwNTg2LCJzdWIiOiIvc3BhY2VzL2lhbm1kL2F1dG90cmFpbi10ZXN0aW5nIiwiZXhwIjoxNzQ4Nzk2OTg2LCJpc3MiOiJodHRwczovL2h1Z2dpbmdmYWNlLmNvIn0.sExea1b6OSWrCBCUfS_3I9DmYqaIclQC9dNG4pukT00UNEB_2x8uq3bt-Culu03y-zIoAfhT94RQR_IAEfwxCw HTTP/1.1” 307 Temporary Redirect

INFO: Uvicorn running on http://0.0.0.0:7860 (Press CTRL+C to quit)

INFO: Application startup complete.

INFO: Waiting for application startup.

INFO: Started server process [49]

INFO | 2025-05-31 16:54:20 | autotrain.app.app::24 - AutoTrain started successfully

INFO | 2025-05-31 16:54:20 | autotrain.app.app::23 - AutoTrain version: 0.8.36

INFO | 2025-05-31 16:54:20 | autotrain.app.app::13 - Starting AutoTrain…

INFO | 2025-05-31 16:54:20 | autotrain.app.ui_routes::315 - AutoTrain started successfully

INFO | 2025-05-31 16:54:18 | autotrain.app.ui_routes::31 - Starting AutoTrain…

I have also tried other QA datasets from Huggingface but get the same errors. I have tried fine-tuning a text_classification model and everything is fine using a CPU. I have spent hours trying to figure this out and have searched online for why this is happening. The load_metric appears to be problematic but I do not understand why I should be getting this given it is exactly the same as the example given in the documentation (although I do not know what processor was being used.
Has anyone had a similar issue? I would appreciate any pointers on this.
Thanks very much
ian

1 Like

Hi,
What is the version of datasets?

Have you seen this post? Can't import load_metric from datasets - #5 by ohnoadrummer?

2 Likes

Hi,

yes I did see that post but I am using autotrain so not sure how I could use that code unless I try to go to colab and do it, which I could try.
I could not see a version number of the dataset in the data card and only found this on github but not experienced with github. Would it be this number? c8fb83a. If not then please let me know where to look for the version. Thanks for your help. Ian

1 Like

Run import datasets
print(datasets.version)
Or
pip show datasets so we can see the version :slight_smile:

1 Like

It looks like you’re using the no-code platform and may have duplicated the space.

Go to https://huggingface.co/spaces/ianmd/autotrain-advanced/tree/main and update the Dockerfile.

Change this:

FROM huggingface/autotrain-advanced:latest  
CMD pip uninstall -y autotrain-advanced && pip install -U autotrain-advanced && autotrain app --host 0.0.0.0 --port 7860 --workers 1

To this:

FROM huggingface/autotrain-advanced:latest  
RUN pip install --upgrade pip \  
 && pip install datasets==2.9  
CMD pip uninstall -y autotrain-advanced && pip install -U autotrain-advanced && autotrain app --host 0.0.0.0 --port 7860 --workers 1

Then, restart your Space.

Or, you can try using an older version of the Docker container, for example:

FROM huggingface/autotrain-advanced:5e9f28f

You can browse available tags here:
https://hub.docker.com/r/huggingface/autotrain-advanced/tags?page=4

2 Likes

Thanks for your reply. The first link you sent to me just goes to a 404 page. I am logged in and I am using Windows 11 (alas) and I am signed up on the Pro subscription. Not sure if this makes any difference but I thought I should mention it.
I did indeed duplicate the space at some point as the steps I took to set up a space instructed me to do so. It would not allow me to proceed without doing so. Maybe I should start again with a new space?
Thanks for your patience with this. Ian

1 Like

Check out your profile to view your spaces. For instance, you can see yours here:

https://huggingface.co/your_user_name
Like this example:

1 Like

That works fine and only shows the one space ianmd. But the previous link you shared with my user name in just goes to a 404 page which is odd. Ian

1 Like

Hi,

I using autotrain I don’t think I can do that from there. I will have a go in Colab. ian

1 Like

But please try it first! Go to your space, click on ‘Files’, and select the Dockerfile to update it. Then restart your space by clicking the ‘…’ next to ‘Settings’.


1 Like

Hi again,

I tried that and updated the docker file and then reset the space. It did not ask me to login but I tried running the Extractive Question Answer example again but got the same errors again. There are shown below plus I will attach a screen shot of the autotrain run page. I also tried the process several times with the facebook oberta base model and the google bert uncased model but got the same errors. I am just running in the CPU mode for free so tried it with mixed precision and without but did not seem to make any difference. It seems it is still having load_metric issues. Here is the error:

NFO | 2025-06-01 12:23:09 | autotrain.app.utils:kill_process_by_pid:90 - Sent SIGTERM to process with PID 85

INFO | 2025-06-01 12:23:09 | autotrain.app.utils:get_running_jobs:40 - Killing PID: 85

subprocess.CalledProcessError: Command ‘[’/app/env/bin/python’, ‘-m’, ‘autotrain.trainers.extractive_question_answering’, ‘–training_config’, ‘autotrain-xyh3v-g4vqg/training_params.json’]’ returned non-zero exit status 1.

raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)

File “/app/env/lib/python3.10/site-packages/accelerate/commands/launch.py”, line 763, in simple_launcher

simple_launcher(args)

File “/app/env/lib/python3.10/site-packages/accelerate/commands/launch.py”, line 1168, in launch_command

args.func(args)

File “/app/env/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py”, line 48, in main

sys.exit(main())

File “/app/env/bin/accelerate”, line 8, in

Traceback (most recent call last):

ImportError: cannot import name ‘load_metric’ from ‘datasets’ (/app/env/lib/python3.10/site-packages/datasets/init.py)

from datasets import load_metric

File “/app/env/lib/python3.10/site-packages/autotrain/trainers/extractive_question_answering/utils.py”, line 6, in

from autotrain.trainers.extractive_question_answering import utils

File “/app/env/lib/python3.10/site-packages/autotrain/trainers/extractive_question_answering/main.py”, line 30, in

exec(code, run_globals)

File “/app/env/lib/python3.10/runpy.py”, line 86, in _run_code

return _run_code(code, main_globals, None,

File “/app/env/lib/python3.10/runpy.py”, line 196, in _run_module_as_main

Traceback (most recent call last):

To avoid this warning pass in values for each of the problematic parameters or run accelerate config.

--dynamo_backend was set to a value of 'no'

--mixed_precision was set to a value of 'no'

--num_machines was set to a value of 1

--num_processes was set to a value of 0

The following values were not passed to accelerate launch and had defaults used instead:

INFO | 2025-06-01 12:22:50 | autotrain.backends.local:create:25 - Training PID: 85

INFO | 2025-06-01 12:22:50 | autotrain.commands:launch_command:515 - {‘data_path’: ‘lhoestq/squad’, ‘model’: ‘FacebookAI/roberta-base’, ‘lr’: 5e-05, ‘epochs’: 3, ‘max_seq_length’: 512, ‘max_doc_stride’: 128, ‘batch_size’: 8, ‘warmup_ratio’: 0.1, ‘gradient_accumulation’: 1, ‘optimizer’: ‘adamw_torch’, ‘scheduler’: ‘linear’, ‘weight_decay’: 0.0, ‘max_grad_norm’: 1.0, ‘seed’: 42, ‘train_split’: ‘train’, ‘valid_split’: ‘validation’, ‘text_column’: ‘context’, ‘question_column’: ‘question’, ‘answer_column’: ‘answers’, ‘logging_steps’: -1, ‘project_name’: ‘autotrain-xyh3v-g4vqg’, ‘auto_find_batch_size’: False, ‘mixed_precision’: ‘none’, ‘save_total_limit’: 1, ‘token’: ‘*****’, ‘push_to_hub’: True, ‘eval_strategy’: ‘epoch’, ‘username’: ‘ianmd’, ‘log’: ‘tensorboard’, ‘early_stopping_patience’: 5, ‘early_stopping_threshold’: 0.01}

INFO | 2025-06-01 12:22:50 | autotrain.commands:launch_command:514 - [‘accelerate’, ‘launch’, ‘–cpu’, ‘-m’, ‘autotrain.trainers.extractive_question_answering’, ‘–training_config’, ‘autotrain-xyh3v-g4vqg/training_params.json’]

INFO | 2025-06-01 12:22:50 | autotrain.backends.local:create:20 - Starting local training…

INFO | 2025-06-01 12:22:50 | autotrain.app.ui_routes:handle_form:540 - hardware: local-ui

INFO | 2025-06-01 12:21:07 | autotrain.app.ui_routes:fetch_params:415 - Task: extractive-qa

INFO | 2025-06-01 12:20:34 | autotrain.app.ui_routes:fetch_params:415 - Task: llm:sft

INFO: 10.16.27.22:33965 - “GET /?logs=container&__sign=eyJhbGciOiJFZERTQSJ9.eyJyZWFkIjp0cnVlLCJwZXJtaXNzaW9ucyI6eyJyZXBvLmNvbnRlbnQucmVhZCI6dHJ1ZX0sIm9uQmVoYWxmT2YiOnsia2luZCI6InVzZXIiLCJfaWQiOiI2N2VlNTdmZDM1NDdmODIzMTAyNTI5M2MiLCJ1c2VyIjoiaWFubWQiLCJzZXNzaW9uSWQiOiI2ODNjNDU0MzUwYWM5ODI5Y2Y4NzE4ZWMifSwiaWF0IjoxNzQ4NzgwNDMzLCJzdWIiOiIvc3BhY2VzL2lhbm1kL2F1dG90cmFpbi10ZXN0aW5nIiwiZXhwIjoxNzQ4ODY2ODMzLCJpc3MiOiJodHRwczovL2h1Z2dpbmdmYWNlLmNvIn0.nXNp7G-6ybFRqCpo4InnYryulxIOTgqe7GRo1346CK8kK6aHX4f775QfirCyuqGLt3hMRRfd5Gd7WjaL2i-RDQ HTTP/1.1” 307 Temporary Redirect

INFO: 10.16.44.224:9661 - “GET /?logs=container&__sign=eyJhbGciOiJFZERTQSJ9.eyJyZWFkIjp0cnVlLCJwZXJtaXNzaW9ucyI6eyJyZXBvLmNvbnRlbnQucmVhZCI6dHJ1ZX0sIm9uQmVoYWxmT2YiOnsia2luZCI6InVzZXIiLCJfaWQiOiI2N2VlNTdmZDM1NDdmODIzMTAyNTI5M2MiLCJ1c2VyIjoiaWFubWQiLCJzZXNzaW9uSWQiOiI2ODNjNDU0MzUwYWM5ODI5Y2Y4NzE4ZWMifSwiaWF0IjoxNzQ4NzgwNDMzLCJzdWIiOiIvc3BhY2VzL2lhbm1kL2F1dG90cmFpbi10ZXN0aW5nIiwiZXhwIjoxNzQ4ODY2ODMzLCJpc3MiOiJodHRwczovL2h1Z2dpbmdmYWNlLmNvIn0.nXNp7G-6ybFRqCpo4InnYryulxIOTgqe7GRo1346CK8kK6aHX4f775QfirCyuqGLt3hMRRfd5Gd7WjaL2i-RDQ HTTP/1.1” 307 Temporary Redirect

INFO: Uvicorn running on http://0.0.0.0:7860 (Press CTRL+C to quit)

INFO: Application startup complete.

INFO: Waiting for application startup.

INFO: Started server process [61]

INFO | 2025-06-01 12:20:32 | autotrain.app.app::24 - AutoTrain started successfully

INFO | 2025-06-01 12:20:32 | autotrain.app.app::23 - AutoTrain version: 0.8.36

INFO | 2025-06-01 12:20:32 | autotrain.app.app::13 - Starting AutoTrain…

INFO | 2025-06-01 12:20:32 | autotrain.app.ui_routes::315 - AutoTrain started successfully

INFO | 2025-06-01 12:20:29 | autotrain.app.ui_routes::31 - Starting AutoTrain…

Thanks! ian

1 Like

1 Like

Can you share the logs? You can find them by clicking the ‘Logs’ next to ‘Running’ in the upper left corner.

Not all, just the part with datasets:

Requirement already satisfied: datasets~=3.2.0 in ./env/lib/python3.10/site-packages (from datasets[vision]~=3.2.0->autotrain-advanced) (3.2.0)

1 Like

===== Application Startup at 2025-06-01 12:24:43 =====

==========
== CUDA ==

CUDA Version 12.1.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
Use the NVIDIA Container Toolkit to start this container with GPU support; see
NVIDIA Cloud Native Technologies - NVIDIA Docs .

Found existing installation: autotrain-advanced 0.8.37.dev0
Uninstalling autotrain-advanced-0.8.37.dev0:
Successfully uninstalled autotrain-advanced-0.8.37.dev0
Collecting autotrain-advanced
Downloading autotrain_advanced-0.8.36-py3-none-any.whl.metadata (21 kB)
Requirement already satisfied: albumentations==1.4.23 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (1.4.23)
Requirement already satisfied: datasets~=3.2.0 in ./env/lib/python3.10/site-packages (from datasets[vision]~=3.2.0->autotrain-advanced) (3.2.0)
Requirement already satisfied: evaluate==0.4.3 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.4.3)
Requirement already satisfied: ipadic==1.0.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (1.0.0)
Requirement already satisfied: jiwer==3.0.5 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (3.0.5)
Requirement already satisfied: joblib==1.4.2 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (1.4.2)
Requirement already satisfied: loguru==0.7.3 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.7.3)
Requirement already satisfied: pandas==2.2.3 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (2.2.3)

1 Like

The Docker setup should be like this. Could you please update it and restart?

FROM huggingface/autotrain-advanced:latest   
CMD pip uninstall -y autotrain-advanced && pip install -U autotrain-advanced && pip install datasets==2.9  && autotrain app --host 0.0.0.0 --port 7860 --workers 1
1 Like

Can you tell me where I have to go to do this? ian

1 Like

Files → Dockerfile. Change it, commit and restart the space.

1 Like

Thank you! Some progress! This time the autotrain ran but had the following error:

===== Application Startup at 2025-06-02 10:56:54 =====

Found existing installation: datasets 3.2.0
Uninstalling datasets-3.2.0:
  Successfully uninstalled datasets-3.2.0

ERROR: pip’s dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
autotrain-advanced 0.8.36 requires datasets[vision]~=3.2.0, but you have datasets 2.9.0 which is incompatible.
trl 0.13.0 requires datasets>=2.21.0, but you have datasets 2.9.0 which is incompatible.
Successfully installed datasets-2.9.0 dill-0.3.6 multiprocess-0.70.14 responses-0.18.0

1 Like

What about the space? Does it work now?

1 Like

The autotrain ran the fine-tuning of the model and this time it went to the 'restart ’ notification, so I did a restart and checked the logs to see the error. When I go to Space->profile I do not see the newly tuned model there. I am not sure if I am answering your question exactly, so please let me know if you need more information. Thank you, Ian

1 Like