How to compile and finetune the pytorch-based transformer model?

I’m switching to Pytorch 2.0.1 and want to compile the model for training times improvement. There are two approaches for model compilation - using torch API and transformers API, and neither of them works as expected.

Transformers API

Training becomes waaaay slower (10-30 times, A10G GPU). Maybe it’s because of dynamic input shapes (which kinda should be padded anyway)

model = AutoModelForSequenceClassification.from_pretrained(
    model_id, num_labels=num_labels, label2id=label2id, id2label=id2label
)

training_args = TrainingArguments(
    output_dir="./temp",
    per_device_train_batch_size=128,
    per_device_eval_batch_size=128,
    learning_rate=5e-5,
    num_train_epochs=3,
    torch_compile=True,
    optim="adamw_torch_fused",
    logging_steps=1,
    logging_strategy="steps",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    save_total_limit=2,
    load_best_model_at_end=True,
    metric_for_best_model="f1",
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
)

trainer.train()

Pytorch API

It should have solved the dynamic input size issue, but it refuses to run at all and throws an error

model = AutoModelForSequenceClassification.from_pretrained(
    model_id, num_labels=num_labels, label2id=label2id, id2label=id2label
)
model = torch.compile(model, dynamic=True, fullgraph=True)

training_args = TrainingArguments(
    output_dir="./temp",
    per_device_train_batch_size=128,
    per_device_eval_batch_size=128,
    learning_rate=5e-5,
    num_train_epochs=3,
    optim="adamw_torch_fused",
    logging_steps=1,
    logging_strategy="steps",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    save_total_limit=2,
    load_best_model_at_end=True,
    metric_for_best_model="f1",
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
)

trainer.train()

Here is the error

/usr/local/lib/python3.10/dist-packages/transformers/trainer_pt_utils.py in get_model_param_count(model, trainable_only)
   1051             return p.numel()
   1052 
-> 1053     return sum(numel(p) for p in model.parameters() if not trainable_only or p.requires_grad)
   1054 
   1055 

AttributeError: 'function' object has no attribute 'parameters'

Here is the corresponding colab

How can I utilize model compilation to speed up the process?

P.S. Crossposting from SO