Hi, I am trying to reduce memory and speed up my own fine-tuned transformer. I came across the tutorial for pruning on the huggingface site. I am referring to the following snippet. The trainer.train() is missing, so I added it. It ran without error, however, there is no reduction in memory (I used model.get_memory_footprint() and before and after pruning it was ~439mb). Same for inference speed. I also tried out different pruning configurations (global pruning, different pruning types or target sparsities) but it did not help. Can someone help me?
from optimum.intel.neural_compressor import INCTrainer from neural_compressor import WeightPruningConfig # The configuration detailing the pruning process pruning_config = WeightPruningConfig( pruning_type="magnitude", start_step=0, end_step=15, target_sparsity=0.2, pruning_scope="local", ) trainer = INCTrainer( model=model, pruning_config=pruning_config, args=TrainingArguments(save_dir, num_train_epochs=1.0, do_train=True, do_eval=False), train_dataset=dataset["train"].select(range(300)), eval_dataset=dataset["validation"], compute_metrics=compute_metrics, tokenizer=tokenizer, data_collator=default_data_collator, ) train_result = trainer.train() # <-- Added by me trainer.save_model(save_dir) # <-- Added by me optimized_model = AutoModelForSequenceClassification.from_pretrained(save_dir)