In this github repo example
we can use bloom-7b1 but I don’t see an option to train (fine tune) the existing model.
Can it be done?
In this github repo example
we can use bloom-7b1 but I don’t see an option to train (fine tune) the existing model.
Can it be done?
python run_generation.py \
--model_name_or_path gpt2 \
--use_hpu_graphs \
--use_kv_cache \
--max_new_tokens 100 \
--do_sample \
--prompt "Here is my prompt"
this is the standard script. Can we add do_train here?
If this script won’t do it, can I get any script that can?
Hi @gildesh! The script you linked to enables to perform generation with a model but it does not support training.
For fine-tuning BLOOM 7B, you would need to use the language modeling example: https://github.com/huggingface/optimum-habana/tree/main/examples/language-modeling#gpt-2gpt-and-causal-language-modeling
However, not sure if Gaudi1 has enough memory to train this model. You could try using DeepSpeed ZeRO-3, it may work:
pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.10.0
python ../gaudi_spawn.py \
--world_size 8 --use_deepspeed run_clm.py \
--model_name_or_path bigscience/bloom-7b1 \
--dataset_name wikitext \
--dataset_config_name wikitext-2-raw-v1 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--do_train \
--do_eval \
--output_dir /tmp/test-clm \
--gaudi_config_name Habana/gpt2 \
--use_habana \
--use_lazy_mode \
--use_hpu_graphs_for_inference \
--gradient_checkpointing \
--use_cache False \
--throughput_warmup_steps 3 \
--deepspeed path_to_my_deepspeed_config
with for example this DeepSpeed config: https://github.com/huggingface/optimum-habana/blob/main/examples/summarization/ds_flan_t5_z3_config_bf16.jsonIf this works, you could increase the batch size to see if you can fit bigger batches. Let me know if you manage to launch a training or if you need any help
Thanks a lot regiss!
I did run into this error
Used this deep speed config
{
“steps_per_print”: 64,
“train_batch_size”: “auto”,
“train_micro_batch_size_per_gpu”: “auto”,
“gradient_accumulation_steps”: “auto”,
“bf16”: {
“enabled”: true
},
“gradient_clipping”: 1.0,
“zero_optimization”: {
“stage”: 2,
“overlap_comm”: false,
“reduce_scatter”: false,
“contiguous_gradients”: false
}
}
and this script
python …/gaudi_spawn.py
–world_size 8 --use_deepspeed run_clm.py
–model_name_or_path bigscience/bloom-7b1
–dataset_name wikitext
–dataset_config_name wikitext-2-raw-v1
–per_device_train_batch_size 1
–per_device_eval_batch_size 1
–do_train
–do_eval
–output_dir /tmp/test-clm
–gaudi_config_name Habana/gpt2
–use_habana
–use_lazy_mode
–use_hpu_graphs_for_inference
–gradient_checkpointing
–use_cache False
–throughput_warmup_steps 3
–deepspeed deep_cnvrg.json
Is it a simple memory issue or something else?
when i ran it with bloom-560m, (below is the script for reference)
python …/gaudi_spawn.py
–world_size 8 --use_deepspeed run_clm.py
–model_name_or_path bigscience/bloom-560m
–dataset_name wikitext
–dataset_config_name wikitext-2-raw-v1
–per_device_train_batch_size 1
–per_device_eval_batch_size 1
–do_train
–do_eval
–output_dir /tmp/test-clm
–gaudi_config_name Habana/gpt2
–use_habana
–use_lazy_mode
–use_hpu_graphs_for_inference
–gradient_checkpointing
–use_cache False
–throughput_warmup_steps 3
–deepspeed deep_cnvrg.json
–overwrite_output_dir
I got this error
maybe because the gaudi_config_name is habana/gpt-2 but the model is bloom-560?
And how do we know what is the amount of CPU/Memory it is using? the dl1 has 8 HPUs, 768 GB but in the script we can only know about the world_size which is the number of HPUs
Weird that it fails with BLOOM-560m too. I’ll look into it in the next few days and will let you know what I find.
gaudi_config_name
is mainly used to specify the operators to use in bf16 precision, but DeepSpeed manages that itself so it shouldn’t be the issue here.
768 GB is the memory of the ohst (CPU) that you can monitor with top
or htop
. If you want to monitor the memory of the 8 Gaudi devices, you can run hl-smi
.
Thanks regiss!
@gildesh There was indeed a bug in the custom modeling of BLOOM. The fix has just been merged into the main branch of Optimum Habana, you can install the repo with
pip install git+https://github.com/huggingface/optimum-habana.git
to have it.
Note that BLOOM 7B is too big to fit on Gaudi1 devices even using DeepSpeed ZeRO-3 so it will fail with a memory allocation error. It works well on Gaudi2 on the other hand. For BLOOM 560m, not sure you need DeepSpeed at all since it should fit on Gaudi1 devices (unless you would like to save memory to fit bigger batches).
@gildesh We added an example showing how to fine-tune BLOOM-7b1 on Gaudi1 with DeepSpeed ZeRO-3 here: optimum-habana/examples/language-modeling at main · huggingface/optimum-habana · GitHub
This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.