!pip install --upgrade pip
!pip install transformers
!pip install datasets
!pip install pandas
!pip install openpyxl
!pip install accelerate
from transformers import Trainer, TrainingArguments
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import DataCollatorForSeq2Seq
tokenizer = AutoTokenizer.from_pretrained("skt/kogpt2-base-v2")
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained("skt/kogpt2-base-v2")
model.resize_token_embeddings(len(tokenizer))
data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model, return_tensors='pt')
training_args = TrainingArguments(
output_dir = './outputs',
logging_dir = './logs',
num_train_epochs = 1,
per_device_train_batch_size=1,
per_device_eval_batch_size=1,
logging_steps = 50,
save_steps= 50,
save_total_limit=2
)
trainer = Trainer(
model=model,
tokenizer=tokenizer,
args=training_args,
train_dataset=dataset,
data_collator=data_collator
)
trainer.train()
I set training code like this in runpod 2 L40. It work in one L40.
But when I run same code in 2 L40 it didn’t work.
It take GPU memory and utilization like normal but just didn’t update model weight.