NameError: name 'small_train_dataset' is not defined

jeiku · May 2, 2022, 5:40am

Hello! I am attempting to follow the fine tuning tutorial and I have come across this error. The only changes i have made to the example code are the model and dataset. Here is the code:

from datasets import load_dataset

dataset = load_dataset("emotion")
dataset["train"][100]

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-large")

tokenizer.pad_token = tokenizer.eos_token 

def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_datasets = dataset.map(tokenize_function, batched=True)

from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("microsoft/DialoGPT-large", num_labels=5)

from transformers import TrainingArguments

training_args = TrainingArguments(output_dir="test_trainer")

import numpy as np
from datasets import load_metric

metric = load_metric("accuracy")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(output_dir="test_trainer", evaluation_strategy="epoch")

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=small_train_dataset,
    eval_dataset=small_eval_dataset,
    compute_metrics=compute_metrics,
)
trainer.train()

In Visual Studio I can see that both small_train_dataset and small_eval_dataset are not defined, but I have no idea what to define them with and it is not included in the documentation. Please help, I really want to start fine-tuning my model!

BramVanroy · May 2, 2022, 7:10am

The following should work. If you look up the dataset that you load (emotion) you can see that it has three splits: train, validation, test. So you use the train and validation splits during training. At the end you can test the final model on the held-out set test if you want.

jeiku:

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    compute_metrics=compute_metrics,
)

jeiku · May 2, 2022, 7:53am

This worked to get to training, however I have encountered a memory error:

RuntimeError: [enforce fail at ..\c10\core\impl\alloc_cpu.cpp:81] data. DefaultCPUAllocator: not enough memory: you tried to allocate 671088640 bytes.

I have attempted to get my GPU working (1660ti,) but after installing the proper CUDA and CUDNN for my tensorflow and python versions, it still does not work. I thought maybe my CPU could cut it since it’s an i7, but maybe I was too hopeful. I would appreciate any advice you could give, perhaps some way to limit the memory used?

Edit: I have lowered the batch size to 2, however i now receive this error:

AssertionError: Cannot handle batch sizes > 1 if no padding token is defined.

I have attempted lowering batch size to 1, however, now my training time is astronomical.

BramVanroy · May 2, 2022, 9:14am

No, training on CPU is not the way to go. Even an i7, i9, whatever commercial CPU you have won’t cut it. I encourage you to try and get the GPU working. However, that is not something that we can help with here as that’s not specifically a transformers issue. What I can say, though, is that I’ve had some environments where it could be difficult to get Tensorflow running. In that regard, PyTorch is easier to get to run IMO because it comes with CUDA included (so its file size is also much larger). You can try that instead, if you want.

jeiku · May 2, 2022, 8:29pm

I’m a little confused by what you mean, as this finetuning method comes from the PyTorch section of the tutorial. I have only followed PyTorch tutorials up to this point and I do not have any TF prefixes in my code… I am very slow and it is possible that I have missed something basic, but I have torch installed. The warning that I get is:

W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-05-02 16:14:57.304314: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

I see that this is a tensorflow issue, but how can i utilize pytorch instead? I have uninstalled transformers and reinstalled transformers[torch], but i still receive the tensorflow error.

I am very confused by all this and i do not have a very strong coding background, so any guidance is appreciated. I really would like to figure this out, as even when limiting the dataset per the tutorial, I am encountering a 55 hour training time.

Edit: I have removed tensorflow and reinstalled transformers and that has stopped the warning, but my training times are still the same, leading me to believe that my cpu is still being utilized.

I am looking further into this to verify that torch is utilizing my gpu, but you’re right the scope has changed and is no longer appropriate for this forum. Thank you for your help!

BramVanroy · May 3, 2022, 6:55am

The issue is likely that by using pip install transformers[torch], under the hood you are doing pip install transformers torch. Depending on your environment (Windows, Mac, Linux) and Python version, this may default to the CPU version of PyTorch. To install the right version, with GPU support, you should go to this page, select the right options for your system, and for “Compute platform” make sure you select a CUDA version and not CPU. Then, run the command that is displayed. (You can leave out torchvision and torchaudio.)

jeiku · May 3, 2022, 7:54am

I have solved this issue, but come up against another. I am currently testing an example dataset to see if i can replicate the issue. Basically everything works except training always fails with CUDA out of memory and 0 bytes free. i have run the example code successfully, but cannot utilize a different dataset.

Topic		Replies	Views
"Trainer - a PyTorch optimized training loop" example code Beginners	1	484	November 1, 2022
Transformers: Fine-tuning is failed on dataset built from csv file Beginners	0	887	July 22, 2021
Trouble with fine tuning DialoGPT-large Beginners	1	1631	January 7, 2022
IndexError: list index out of range, when trying to predict from the fine tuned model Beginners	0	78	July 20, 2024
Trouble saving and loading a finetuned model Beginners	1	255	July 7, 2024

NameError: name 'small_train_dataset' is not defined

Related topics