In Visual Studio I can see that both small_train_dataset and small_eval_dataset are not defined, but I have no idea what to define them with and it is not included in the documentation. Please help, I really want to start fine-tuning my model!
The following should work. If you look up the dataset that you load (emotion) you can see that it has three splits: train, validation, test. So you use the train and validation splits during training. At the end you can test the final model on the held-out set test if you want.
This worked to get to training, however I have encountered a memory error:
RuntimeError: [enforce fail at ..\c10\core\impl\alloc_cpu.cpp:81] data. DefaultCPUAllocator: not enough memory: you tried to allocate 671088640 bytes.
I have attempted to get my GPU working (1660ti,) but after installing the proper CUDA and CUDNN for my tensorflow and python versions, it still does not work. I thought maybe my CPU could cut it since it’s an i7, but maybe I was too hopeful. I would appreciate any advice you could give, perhaps some way to limit the memory used?
Edit: I have lowered the batch size to 2, however i now receive this error:
AssertionError: Cannot handle batch sizes > 1 if no padding token is defined.
I have attempted lowering batch size to 1, however, now my training time is astronomical.
No, training on CPU is not the way to go. Even an i7, i9, whatever commercial CPU you have won’t cut it. I encourage you to try and get the GPU working. However, that is not something that we can help with here as that’s not specifically a transformers issue. What I can say, though, is that I’ve had some environments where it could be difficult to get Tensorflow running. In that regard, PyTorch is easier to get to run IMO because it comes with CUDA included (so its file size is also much larger). You can try that instead, if you want.
I’m a little confused by what you mean, as this finetuning method comes from the PyTorch section of the tutorial. I have only followed PyTorch tutorials up to this point and I do not have any TF prefixes in my code… I am very slow and it is possible that I have missed something basic, but I have torch installed. The warning that I get is:
W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-05-02 16:14:57.304314: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
I see that this is a tensorflow issue, but how can i utilize pytorch instead? I have uninstalled transformers and reinstalled transformers[torch], but i still receive the tensorflow error.
I am very confused by all this and i do not have a very strong coding background, so any guidance is appreciated. I really would like to figure this out, as even when limiting the dataset per the tutorial, I am encountering a 55 hour training time.
Edit: I have removed tensorflow and reinstalled transformers and that has stopped the warning, but my training times are still the same, leading me to believe that my cpu is still being utilized.
I am looking further into this to verify that torch is utilizing my gpu, but you’re right the scope has changed and is no longer appropriate for this forum. Thank you for your help!
The issue is likely that by using pip install transformers[torch], under the hood you are doing pip install transformers torch. Depending on your environment (Windows, Mac, Linux) and Python version, this may default to the CPU version of PyTorch. To install the right version, with GPU support, you should go to this page, select the right options for your system, and for “Compute platform” make sure you select a CUDA version and not CPU. Then, run the command that is displayed. (You can leave out torchvision and torchaudio.)
I have solved this issue, but come up against another. I am currently testing an example dataset to see if i can replicate the issue. Basically everything works except training always fails with CUDA out of memory and 0 bytes free. i have run the example code successfully, but cannot utilize a different dataset.