ValueError: Need either a dataset name or a training/validation file

wenzhao98 · June 24, 2023, 7:58am

I am fine tuning gpt-2 following the example script:

export TRAIN_FILE=/path/to/dataset/wiki.train.raw
export TEST_FILE=/path/to/dataset/wiki.test.raw

python run_lm_finetuning.py
–output_dir=output
–model_type=gpt2
–model_name_or_path=gpt2
–do_train
–train_data_file=$TRAIN_FILE
–do_eval
–eval_data_file=$TEST_FILE

I am using run_clm.py in this link:

github.com

huggingface/transformers/blob/main/examples/tensorflow/language-modeling/run_clm.py

#!/usr/bin/env python
# coding=utf-8
# Copyright 2021 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Fine-tuning the library models for causal language modeling (GPT-2, GPT-Neo...)
on a text file or a dataset without using HuggingFace Trainer.

Here is the full list of checkpoints on the hub that can be fine-tuned by this script:

This file has been truncated. show original

I got:
Traceback (most recent call last):
File “/Users/wenzhao/Downloads/Keras/run_clm.py”, line 627, in
main()
File “/Users/wenzhao/Downloads/Keras/run_clm.py”, line 222, in main
model_args, data_args, training_args = parser.parse_args_into_dataclasses()
File “/Users/wenzhao/anaconda3/lib/python3.10/site-packages/transformers/hf_argparser.py”, line 346, in parse_args_into_dataclasses
obj = dtype(**inputs)
File “”, line 15, in init
File “/Users/wenzhao/Downloads/Keras/run_clm.py”, line 200, in post_init
raise ValueError(“Need either a dataset name or a training/validation file.”)
ValueError: Need either a dataset name or a training/validation file.

nielsr · June 24, 2023, 9:28am

Hi,

The flag is called --train_file, not –train_data_file

Topic		Replies	Views
Transformers: Fine-tuning is failed on dataset built from csv file Beginners	0	894	July 22, 2021
HF Datasets not working with Language Modeling notebook 🤗Datasets	2	1927	May 2, 2021
Data format in run_lm_fine_tuning.py Beginners	2	420	September 8, 2020
HF Datasets not working with Language Modeling Beginners	0	366	May 1, 2021
Fine Tuning GPT-2 - Training job only using test sample size of 5 Amazon SageMaker	4	2168	February 6, 2023

ValueError: Need either a dataset name or a training/validation file

Related topics