Train Roberta from scratch for custom dataset

clementine · March 11, 2021, 11:30pm

Hey there, I am training Roberta from scratch for protein sequences. To this end, I build a tokenizer for protein sequences, which is very much like a character-level tokenizer. After that, I stored my tokenizer and used it in the Roberta model followed this tutorial: How to train a new language model from scratch using Transformers and Tokenizers.

The code runs fine on CPU but failed on GPU, and the error message is as follows:
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)

Thanks in advance for any thoughts on this issue!

LidorPrototype · May 2, 2023, 4:08am

Hello, have you found any solution for that?

Topic		Replies	Views
RoBERTa fine-tuning, CUBLAS_STATUS_NOT_SUPPORTED Beginners	0	974	December 20, 2022
Getting different sentence embeddings when using model on CPU and GPU Beginners	0	2299	August 26, 2022
Stucked on tokenization before training when using 3 GPU, but not when using 2 GPU Beginners	0	308	June 25, 2023
CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)` 🤗Transformers	2	2249	June 30, 2023
Training a domain-specific roberta from roberta-base Beginners	7	6098	February 2, 2021

Train Roberta from scratch for custom dataset

Related topics