How to prune a transformer?

Hi, I am trying to reduce memory and speed up my own fine-tuned transformer. I came across the tutorial for pruning on the huggingface site. I am referring to the following snippet. The trainer.train() is missing, so I added it. It ran without error, however, there is no reduction in memory (I used model.get_memory_footprint() and before and after pruning it was ~439mb). Same for inference speed. I also tried out different pruning configurations (global pruning, different pruning types or target sparsities) but it did not help. Can someone help me?


from optimum.intel.neural_compressor import INCTrainer
from neural_compressor import WeightPruningConfig

# The configuration detailing the pruning process
pruning_config = WeightPruningConfig(
    pruning_type="magnitude",
    start_step=0,
    end_step=15,
    target_sparsity=0.2,
    pruning_scope="local",
)


trainer = INCTrainer(
    model=model,
   pruning_config=pruning_config,
    args=TrainingArguments(save_dir, num_train_epochs=1.0, do_train=True,  do_eval=False),
    train_dataset=dataset["train"].select(range(300)),
    eval_dataset=dataset["validation"],
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=default_data_collator,
)
train_result = trainer.train() # <-- Added by me
trainer.save_model(save_dir) # <-- Added by me
optimized_model = AutoModelForSequenceClassification.from_pretrained(save_dir)

Did you find a way for pruning the transformer model? I also tried the same tutorial But I found the memory footprint is same for pruning and without pruning model.