Hi, I am trying to reduce memory and speed up my own fine-tuned transformer. I came across the tutorial for pruning on the huggingface site. I am referring to the following snippet. The trainer.train() is missing, so I added it. It ran without error, however, there is no reduction in memory (I used model.get_memory_footprint() and before and after pruning it was Model memory footprint: 503695916 bytes). Same for inference speed. I also tried out different pruning configurations (global pruning, different pruning types or target sparsities) but it did not help. Can someone help me?
from optimum.intel.neural_compressor import INCTrainer
from neural_compressor import WeightPruningConfig
from transformers import TrainingArguments, Trainer
from transformers.data.data_collator import default_data_collator
pruning_config = WeightPruningConfig(
pruning_type="magnitude",
start_step=0,
end_step=15,
target_sparsity=0.2,
pruning_scope="local",
)
from transformers import TrainingArguments, Trainer
save_dir="prunedModel"
trainer = INCTrainer(
model=model,
pruning_config=pruning_config,
args=TrainingArguments(save_dir, max_steps=500,num_train_epochs=1.0, do_train=True, do_eval=True,metric_for_best_model="f1",greater_is_better=True),
train_dataset=train_dataset,
eval_dataset=eval_dataset,
compute_metrics=compute_metrics,
tokenizer=processor,
data_collator=default_data_collator,
)
train_result = trainer.train() # <-- Added by me
trainer.save_model(save_dir) # <-- Added by me
optimized_model = AutoModelForSequenceClassification.from_pretrained(save_dir)
memory_footprint = optimized_model.get_memory_footprint()
print(f"Model memory footprint: {memory_footprint} bytes")`
Expected behavior
As per the model should be pruned and the actual model without pruned and the pruned model should have different sizes but they have the Model memory footprint:
@ArthurZucker @younesbelkada @amyeroberts @sgugger @ArthurZucker @pacman100 @stas00 @sgugger @muellerzr @sgugger, @stevhliu @MKhalusova