I’m switching to Pytorch 2.0.1 and want to compile the model for training times improvement. There are two approaches for model compilation - using torch API and transformers API, and neither of them works as expected.
Transformers API
Training becomes waaaay slower (10-30 times, A10G GPU). Maybe it’s because of dynamic input shapes (which kinda should be padded anyway)
model = AutoModelForSequenceClassification.from_pretrained(
model_id, num_labels=num_labels, label2id=label2id, id2label=id2label
)
training_args = TrainingArguments(
output_dir="./temp",
per_device_train_batch_size=128,
per_device_eval_batch_size=128,
learning_rate=5e-5,
num_train_epochs=3,
torch_compile=True,
optim="adamw_torch_fused",
logging_steps=1,
logging_strategy="steps",
evaluation_strategy="epoch",
save_strategy="epoch",
save_total_limit=2,
load_best_model_at_end=True,
metric_for_best_model="f1",
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=dataset["train"],
eval_dataset=dataset["test"],
tokenizer=tokenizer,
compute_metrics=compute_metrics,
)
trainer.train()
Pytorch API
It should have solved the dynamic input size issue, but it refuses to run at all and throws an error
model = AutoModelForSequenceClassification.from_pretrained(
model_id, num_labels=num_labels, label2id=label2id, id2label=id2label
)
model = torch.compile(model, dynamic=True, fullgraph=True)
training_args = TrainingArguments(
output_dir="./temp",
per_device_train_batch_size=128,
per_device_eval_batch_size=128,
learning_rate=5e-5,
num_train_epochs=3,
optim="adamw_torch_fused",
logging_steps=1,
logging_strategy="steps",
evaluation_strategy="epoch",
save_strategy="epoch",
save_total_limit=2,
load_best_model_at_end=True,
metric_for_best_model="f1",
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=dataset["train"],
eval_dataset=dataset["test"],
tokenizer=tokenizer,
compute_metrics=compute_metrics,
)
trainer.train()
Here is the error
/usr/local/lib/python3.10/dist-packages/transformers/trainer_pt_utils.py in get_model_param_count(model, trainable_only)
1051 return p.numel()
1052
-> 1053 return sum(numel(p) for p in model.parameters() if not trainable_only or p.requires_grad)
1054
1055
AttributeError: 'function' object has no attribute 'parameters'
Here is the corresponding colab
How can I utilize model compilation to speed up the process?
P.S. Crossposting from SO