Significance of block size

andstor · March 7, 2022, 7:27pm

Hi!

Reducing the block size seems to be an effective way to fit models in memory. However, I am wondering what consequences this might have for the trained model. Especially if the model is fine-tuned on a smaller block size than used initially for pre-training.

andstor · March 18, 2022, 6:01pm

Ping! Anyone?

xuanvunguyen · June 30, 2023, 8:54am

I’m sorry I can’t answer your question, but could you please share how do you change the block size of a HuggingFace model?

Nmay · February 17, 2024, 10:28am

From what I understand, for AutoTrain with SFT fine-tuning for example, the block size represents the desired attention window size for the input. As such, RAM usage will increase with this setting, similar to inference, but significantly more so with the computation of gradients. The implication is that if you want to fine-tune for an instruct task and your ((Instruction + Context) + Answer) is less than the block size, it might learn the Instruction, Context, and Answer independently. Probably not what you’re aiming for. You can reduce GPU RAM usage by decreasing the batch size and gradient accumulation steps too.

Topic		Replies	Views
Making a model slightly bigger Beginners	0	126	April 27, 2024
Question on HuggingFace Model Beginners	0	864	September 6, 2022
New to Huggingface Beginners	0	512	June 10, 2023
What is the largest model on huggingface? Beginners	1	777	April 15, 2021
How to Efficiently Fine-Tune Models on Custom Datasets with Limited Resources? Beginners	0	119	July 10, 2024

Significance of block size

Related topics