Hi, I am trying to reduce memory and speed up my own fine-tuned transformer. I came across the tutorial for pruning on the huggingface site. I am referring to the following snippet. The trainer.train() is missing, so I added it. It ran without error, however, there is no reduction in memory (I used model.get_memory_footprint() and before and after pruning it was ~439mb). Same for inference speed. I also tried out different pruning configurations (global pruning, different pruning types or target sparsities) but it did not help. Can someone help me?
from optimum.intel.neural_compressor import INCTrainer
from neural_compressor import WeightPruningConfig
# The configuration detailing the pruning process
pruning_config = WeightPruningConfig(
pruning_type="magnitude",
start_step=0,
end_step=15,
target_sparsity=0.2,
pruning_scope="local",
)
trainer = INCTrainer(
model=model,
pruning_config=pruning_config,
args=TrainingArguments(save_dir, num_train_epochs=1.0, do_train=True, do_eval=False),
train_dataset=dataset["train"].select(range(300)),
eval_dataset=dataset["validation"],
compute_metrics=compute_metrics,
tokenizer=tokenizer,
data_collator=default_data_collator,
)
train_result = trainer.train() # <-- Added by me
trainer.save_model(save_dir) # <-- Added by me
optimized_model = AutoModelForSequenceClassification.from_pretrained(save_dir)