Can bloom-7b1 be fine tuned using gaudi 1?

In this github repo example

we can use bloom-7b1 but I don’t see an option to train (fine tune) the existing model.
Can it be done?

python run_generation.py \
--model_name_or_path gpt2 \
--use_hpu_graphs \
--use_kv_cache \
--max_new_tokens 100 \
--do_sample \
--prompt "Here is my prompt"

this is the standard script. Can we add do_train here?

If this script won’t do it, can I get any script that can?

Hi @gildesh! The script you linked to enables to perform generation with a model but it does not support training.

For fine-tuning BLOOM 7B, you would need to use the language modeling example: https://github.com/huggingface/optimum-habana/tree/main/examples/language-modeling#gpt-2gpt-and-causal-language-modeling
However, not sure if Gaudi1 has enough memory to train this model. You could try using DeepSpeed ZeRO-3, it may work:

  1. Install DeepSpeed with
    pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.10.0
    
  2. Then run:
    python ../gaudi_spawn.py \
    --world_size 8 --use_deepspeed run_clm.py \
    --model_name_or_path bigscience/bloom-7b1 \
    --dataset_name wikitext \
    --dataset_config_name wikitext-2-raw-v1 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --do_train \
    --do_eval \
    --output_dir /tmp/test-clm \
    --gaudi_config_name Habana/gpt2 \
    --use_habana \
    --use_lazy_mode \
    --use_hpu_graphs_for_inference \
    --gradient_checkpointing \
    --use_cache False \
    --throughput_warmup_steps 3 \
    --deepspeed path_to_my_deepspeed_config
    
    with for example this DeepSpeed config: https://github.com/huggingface/optimum-habana/blob/main/examples/summarization/ds_flan_t5_z3_config_bf16.json

If this works, you could increase the batch size to see if you can fit bigger batches. Let me know if you manage to launch a training or if you need any help :slight_smile:

Thanks a lot regiss!
I did run into this error

Used this deep speed config
{
“steps_per_print”: 64,
“train_batch_size”: “auto”,
“train_micro_batch_size_per_gpu”: “auto”,
“gradient_accumulation_steps”: “auto”,
“bf16”: {
“enabled”: true
},
“gradient_clipping”: 1.0,
“zero_optimization”: {
“stage”: 2,
“overlap_comm”: false,
“reduce_scatter”: false,
“contiguous_gradients”: false
}
}

and this script

python …/gaudi_spawn.py
–world_size 8 --use_deepspeed run_clm.py
–model_name_or_path bigscience/bloom-7b1
–dataset_name wikitext
–dataset_config_name wikitext-2-raw-v1
–per_device_train_batch_size 1
–per_device_eval_batch_size 1
–do_train
–do_eval
–output_dir /tmp/test-clm
–gaudi_config_name Habana/gpt2
–use_habana
–use_lazy_mode
–use_hpu_graphs_for_inference
–gradient_checkpointing
–use_cache False
–throughput_warmup_steps 3
–deepspeed deep_cnvrg.json

Is it a simple memory issue or something else?

when i ran it with bloom-560m, (below is the script for reference)

python …/gaudi_spawn.py
–world_size 8 --use_deepspeed run_clm.py
–model_name_or_path bigscience/bloom-560m
–dataset_name wikitext
–dataset_config_name wikitext-2-raw-v1
–per_device_train_batch_size 1
–per_device_eval_batch_size 1
–do_train
–do_eval
–output_dir /tmp/test-clm
–gaudi_config_name Habana/gpt2
–use_habana
–use_lazy_mode
–use_hpu_graphs_for_inference
–gradient_checkpointing
–use_cache False
–throughput_warmup_steps 3
–deepspeed deep_cnvrg.json
–overwrite_output_dir

I got this error

maybe because the gaudi_config_name is habana/gpt-2 but the model is bloom-560?

And how do we know what is the amount of CPU/Memory it is using? the dl1 has 8 HPUs, 768 GB but in the script we can only know about the world_size which is the number of HPUs

Weird that it fails with BLOOM-560m too. I’ll look into it in the next few days and will let you know what I find.

gaudi_config_name is mainly used to specify the operators to use in bf16 precision, but DeepSpeed manages that itself so it shouldn’t be the issue here.

768 GB is the memory of the ohst (CPU) that you can monitor with top or htop. If you want to monitor the memory of the 8 Gaudi devices, you can run hl-smi.

Thanks regiss!

@gildesh There was indeed a bug in the custom modeling of BLOOM. The fix has just been merged into the main branch of Optimum Habana, you can install the repo with

pip install git+https://github.com/huggingface/optimum-habana.git

to have it.

Note that BLOOM 7B is too big to fit on Gaudi1 devices even using DeepSpeed ZeRO-3 so it will fail with a memory allocation error. It works well on Gaudi2 on the other hand. For BLOOM 560m, not sure you need DeepSpeed at all since it should fit on Gaudi1 devices (unless you would like to save memory to fit bigger batches).

Hello regiss

I tried again

but this time got another error

@gildesh We added an example showing how to fine-tune BLOOM-7b1 on Gaudi1 with DeepSpeed ZeRO-3 here: optimum-habana/examples/language-modeling at main · huggingface/optimum-habana · GitHub

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.