Can bloom-7b1 be fine tuned using gaudi 1?

gildesh · July 11, 2023, 10:35am

In this github repo example

huggingface/optimum-habana/blob/main/examples/text-generation/run_generation.py

#!/usr/bin/env python
# coding=utf-8
# Copyright 2018 Google AI, Google Brain and Carnegie Mellon University Authors and the HuggingFace Inc. team.
# Copyright (c) 2018, NVIDIA CORPORATION.  All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Conditional text generation on Habana Gaudi/Gaudi2.
"""

This file has been truncated. show original

we can use bloom-7b1 but I don’t see an option to train (fine tune) the existing model.
Can it be done?

gildesh · July 11, 2023, 10:36am

python run_generation.py \
--model_name_or_path gpt2 \
--use_hpu_graphs \
--use_kv_cache \
--max_new_tokens 100 \
--do_sample \
--prompt "Here is my prompt"

this is the standard script. Can we add do_train here?

gildesh · July 11, 2023, 10:55am

If this script won’t do it, can I get any script that can?

regisss · July 11, 2023, 11:04am

Hi @gildesh! The script you linked to enables to perform generation with a model but it does not support training.

For fine-tuning BLOOM 7B, you would need to use the language modeling example: https://github.com/huggingface/optimum-habana/tree/main/examples/language-modeling#gpt-2gpt-and-causal-language-modeling
However, not sure if Gaudi1 has enough memory to train this model. You could try using DeepSpeed ZeRO-3, it may work:

Install DeepSpeed with

pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.10.0

Then run:

python ../gaudi_spawn.py \
--world_size 8 --use_deepspeed run_clm.py \
--model_name_or_path bigscience/bloom-7b1 \
--dataset_name wikitext \
--dataset_config_name wikitext-2-raw-v1 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--do_train \
--do_eval \
--output_dir /tmp/test-clm \
--gaudi_config_name Habana/gpt2 \
--use_habana \
--use_lazy_mode \
--use_hpu_graphs_for_inference \
--gradient_checkpointing \
--use_cache False \
--throughput_warmup_steps 3 \
--deepspeed path_to_my_deepspeed_config

with for example this DeepSpeed config: https://github.com/huggingface/optimum-habana/blob/main/examples/summarization/ds_flan_t5_z3_config_bf16.json

If this works, you could increase the batch size to see if you can fit bigger batches. Let me know if you manage to launch a training or if you need any help

gildesh · July 12, 2023, 3:42am

regisss:

--dataset_config_name wikitext-2-raw-v1 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--do_train \
--do_eval \
--output_dir /tmp/test-clm \
--gaudi_config_name Habana/gpt2 \
--use_habana \
--use_lazy_mode \
--use_hpu_graphs_for_inference \
--gradient_checkpointing \
--use_cache False \
--throughput_warmup_steps 3 \
--deepspeed path_to_my_deepspeed_config

Thanks a lot regiss!
I did run into this error

Used this deep speed config
{
“steps_per_print”: 64,
“train_batch_size”: “auto”,
“train_micro_batch_size_per_gpu”: “auto”,
“gradient_accumulation_steps”: “auto”,
“bf16”: {
“enabled”: true
},
“gradient_clipping”: 1.0,
“zero_optimization”: {
“stage”: 2,
“overlap_comm”: false,
“reduce_scatter”: false,
“contiguous_gradients”: false
}
}

and this script

python …/gaudi_spawn.py
–world_size 8 --use_deepspeed run_clm.py
–model_name_or_path bigscience/bloom-7b1
–dataset_name wikitext
–dataset_config_name wikitext-2-raw-v1
–per_device_train_batch_size 1
–per_device_eval_batch_size 1
–do_train
–do_eval
–output_dir /tmp/test-clm
–gaudi_config_name Habana/gpt2
–use_habana
–use_lazy_mode
–use_hpu_graphs_for_inference
–gradient_checkpointing
–use_cache False
–throughput_warmup_steps 3
–deepspeed deep_cnvrg.json

Is it a simple memory issue or something else?

gildesh · July 12, 2023, 4:01am

when i ran it with bloom-560m, (below is the script for reference)

python …/gaudi_spawn.py
–world_size 8 --use_deepspeed run_clm.py
–model_name_or_path bigscience/bloom-560m
–dataset_name wikitext
–dataset_config_name wikitext-2-raw-v1
–per_device_train_batch_size 1
–per_device_eval_batch_size 1
–do_train
–do_eval
–output_dir /tmp/test-clm
–gaudi_config_name Habana/gpt2
–use_habana
–use_lazy_mode
–use_hpu_graphs_for_inference
–gradient_checkpointing
–use_cache False
–throughput_warmup_steps 3
–deepspeed deep_cnvrg.json
–overwrite_output_dir

I got this error

maybe because the gaudi_config_name is habana/gpt-2 but the model is bloom-560?

gildesh · July 12, 2023, 6:50am

And how do we know what is the amount of CPU/Memory it is using? the dl1 has 8 HPUs, 768 GB but in the script we can only know about the world_size which is the number of HPUs

regisss · July 12, 2023, 8:07am

Weird that it fails with BLOOM-560m too. I’ll look into it in the next few days and will let you know what I find.

gaudi_config_name is mainly used to specify the operators to use in bf16 precision, but DeepSpeed manages that itself so it shouldn’t be the issue here.

768 GB is the memory of the ohst (CPU) that you can monitor with top or htop. If you want to monitor the memory of the 8 Gaudi devices, you can run hl-smi.

gildesh · July 12, 2023, 10:27am

Thanks regiss!

regisss · July 14, 2023, 7:27am

@gildesh There was indeed a bug in the custom modeling of BLOOM. The fix has just been merged into the main branch of Optimum Habana, you can install the repo with

pip install git+https://github.com/huggingface/optimum-habana.git

to have it.

Note that BLOOM 7B is too big to fit on Gaudi1 devices even using DeepSpeed ZeRO-3 so it will fail with a memory allocation error. It works well on Gaudi2 on the other hand. For BLOOM 560m, not sure you need DeepSpeed at all since it should fit on Gaudi1 devices (unless you would like to save memory to fit bigger batches).

gildesh · July 24, 2023, 8:06am

Hello regiss

I tried again

but this time got another error

regisss · January 9, 2024, 2:43pm

@gildesh We added an example showing how to fine-tune BLOOM-7b1 on Gaudi1 with DeepSpeed ZeRO-3 here: optimum-habana/examples/language-modeling at main · huggingface/optimum-habana · GitHub

system · January 10, 2024, 2:44am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
"run_lm_finetuning.py" was replaced? Beginners	5	4645	June 1, 2021
Text generation returning multiple languages Beginners	2	558	January 25, 2023
Dreambooth finetuning does yield expected result Beginners	0	81	June 13, 2024
Double Headed Bloom for Dialogue Tasks Beginners	0	601	October 2, 2022
Error while Trying to run inference using gaudi on a finetuned llama2 model using habana repo 🤗Optimum	9	654	August 21, 2023

Can bloom-7b1 be fine tuned using gaudi 1?

Related topics