I am going through the same problem
I tried setting up the env by running os.environ[‘CUDA_LAUNCH_BLOCKING’] = “1”, but it didn’t work,
I am fine tuning a bert model on my own dataset for intent classification
RuntimeError Traceback (most recent call last)
<ipython-input-20-fa5ddd935c58> in <cell line: 4>()
2 import os
3 os.environ['CUDA_LAUNCH_BLOCKING'] = "1"
----> 4 trainer.train()
4 frames
/usr/local/lib/python3.10/dist-packages/torch/cuda/memory.py in empty_cache()
131 """
132 if is_initialized():
--> 133 torch._C._cuda_emptyCache()
134
135
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Also, I tried to run it on CPU on colab but overthere I was getting IndexError: Target 2 is out of bounds.
Basically I am stuck in a loop of 3 three errors
Here’s my code, could anyone help me with this
from transformers import AutoTokenizer, AutoModelForSequenceClassification, BertForSequenceClassification
# cartesinus/xlm-r-base-amazon-massive-intent-label_smoothing
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("textattack/bert-base-uncased-yelp-polarity")
model = AutoModelForSequenceClassification.from_pretrained("textattack/bert-base-uncased-yelp-polarity")
# Define training arguments
from transformers import Trainer, TrainingArguments
from torch import nn
training_args = TrainingArguments(
per_device_train_batch_size=1,
output_dir='./results', # Directory where checkpoints and logs will be saved
num_train_epochs=3,
evaluation_strategy="steps",
eval_steps=100, # Number of steps before evaluating on the validation set
save_steps=100, # Number of steps before saving a checkpoint
load_best_model_at_end=True,
push_to_hub=False, # Set to True if you want to push the model to the Hugging Face Model Hub
)
from transformers import EvalPrediction
def custom_accuracy(p: EvalPrediction):
# Extract predictions and label_ids
predictions = p.predictions.argmax(axis=1)
label_ids = p.label_ids
# Calculate accuracy
correct = (predictions == label_ids).sum()
total = len(predictions)
accuracy = correct / total
# Return accuracy as a dictionary
return {"accuracy": accuracy}
# Define a metric for evaluation (e.g., accuracy)
metric = load_metric("accuracy")
# Initialize Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=validation_dataset,
data_collator=None, # You can use your own data collator if needed
compute_metrics=custom_accuracy
)
# Fine-tune the model on your dataset
import os
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"
trainer.train()
And here’s a what my dataset looks like, i.e. training data
Dataset({
features: ['label', 'input_ids', 'token_type_ids', 'attention_mask'],
num_rows: 492
})