Pruning a model embedding matrix for memory efficiency

@sshleifer Hi, its been a while. I actually managed to get everything working correctly, including the tokenizer. Seeing how many hits this post has gotten and how many people have reached out to me since, I recently converted my code into a Python library which is now hosted on PyPI and supports both BART and T5.

Link to package. You can use the library to trim a model and its tokenizer to your data and then save both as new models. These models can then be reloaded the like native HuggingFace models for use again.

@Bookworm hope this helps you too.

1 Like