Fine tuned model gave unreadable responses

aprilzoo · October 17, 2023, 12:40am

I’m fine tuning the Falcon-40B model on a single node multi-GPU. The training process finished without an error. I can also load the model using AutoModelForCausalLM.from_pretrained. However, the model is giving unreadable words.

Here is the printout
Use the context below to answer the user’s question.
Context: Much of the time, when we get training about communication, we are told how to say things. What I want to focus on in this segment is actually more on your listening skills-- how you make yourself available and open and receptive to hear what others have to say. And one of the things that makes effective listening challenging is that we often are half-listening, and the other half is already thinking about what I’m going to say, because I should say something in return. That, I’m going to challenge for you. Why is effective listening so important in relationships? Relationships are both. It is the talking, but it is even more so the listening. And it is the listener and the quality of our listening that will actually shape what the speaker will say and how they will say it. We think that the other person just said this because that’s what they say, but no. What I’m saying to you is influenced by how I experience your listening to what I’m saying. And your listening to what I’m saying is shaping what I’m going to say next. So listening is anything but passive. It is actually very active and very powerful in shaping the conversation, the communication, and thus the relationship.

User: What is effective listening?
Bot: niche niche niche niche niche niche niche niche niche niche niche niche niche niche niche niche niche niche niche niche niche niche niche niche niche niche niche niche niche niche niche niche niche niche niche niche Hels Hels Hels Hels Hels Helshaltung Hels fict niche niche niche niche niche niche niche niche niche niche niche niche niche niche niche

Another observation is that, when load the model, it printed this warning:

Some weights of the model checkpoint at ./cv_botfalcon_40b_base_rag_friendly_multigpu_w_group_texts_block_size_512_per_device_batch_size_2_v_ak were not used when initializing FalconForCausalLM: [‘_flat_param’]

This IS expected if you are initializing FalconForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing FalconForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of FalconForCausalLM were not initialized from the model checkpoint at ./cv_botfalcon_40b_base_rag_friendly_multigpu_w_group_texts_block_size_512_per_device_batch_size_2_v_ak and are newly initialized: [‘transformer.ln_f.bias’, ‘lm_head.weight’, ‘transformer.word_embeddings.weight’, ‘transformer.ln_f.weight’]
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

When load the tokenizer, it prints out:
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

Any advices?

varadhbhatnagar · June 18, 2024, 8:36am

Hi @aprilzoo

FSDP model saving logic was changed slightly around accelerate 0.22. If you check the index file where you model is saved, you will find some layers missing.

Try the approach below and it should work.

github.com/huggingface/accelerate

Training with FSDP and Trainer class

opened 11:35PM - 04 Oct 23 UTC

closed 03:06PM - 10 Mar 24 UTC

nebrelbug

### System Info ```Shell - `Accelerate` version: 0.23.0 - Platform: Linux-3.10….0-1160.92.1.el7.x86_64-x86_64-with-glibc2.35 - Python version: 3.10.12 - Numpy version: 1.26.0 - PyTorch version (GPU?): 2.1.0a0+32f93b1 (True) - PyTorch XPU available: False - PyTorch NPU available: False - System RAM: 2003.87 GB - GPU type: NVIDIA A100-SXM4-80GB - `Accelerate` default config: Not found ``` ### Information - [ ] The official example scripts - [X] My own modified scripts ### Tasks - [ ] One of the scripts in the examples/ folder of Accelerate or an officially supported `no_trainer` script in the `examples` folder of the `transformers` repo (such as `run_no_trainer_glue.py`) - [X] My own task or dataset (give details below) ### Reproduction `accelerate_config.yaml` ```yaml compute_environment: LOCAL_MACHINE deepspeed_config: {} distributed_type: FSDP downcast_bf16: 'no' fsdp_config: fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP fsdp_backward_prefetch_policy: BACKWARD_PRE fsdp_offload_params: false fsdp_sharding_strategy: 1 fsdp_state_dict_type: FULL_STATE_DICT fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer machine_rank: 0 main_process_ip: null main_process_port: null main_training_function: main mixed_precision: 'no' num_machines: 1 num_processes: 8 use_cpu: false ``` `loop.py` ```python from transformers import LlamaForCausalLM, LlamaTokenizer, Trainer, TrainingArguments from accelerate import Accelerator import torch import os from get_data import train_dataset, eval_dataset, data_collator accelerator = Accelerator() SIZE = "7b" MODEL_PATH = f"/mnt/models/llama2/hf/Llama-2-{SIZE}-hf" NAME = f"llama2-{SIZE}-test" PER_DEVICE_BATCH_SIZE = 1 NUM_EPOCHS = 1 LEARNING_RATE = 2e-5 * accelerator.num_processes OUTPUT_DIR = os.environ["SLURM_JOB_NAME"] tokenizer = LlamaTokenizer.from_pretrained(MODEL_PATH, legacy=False) tokenizer.pad_token = tokenizer.eos_token tokenizer.padding_side = "right" model = LlamaForCausalLM.from_pretrained(MODEL_PATH, torch_dtype=torch.bfloat16) model.config.do_sample = True training_args = TrainingArguments( output_dir=OUTPUT_DIR, num_train_epochs=NUM_EPOCHS, learning_rate=LEARNING_RATE, logging_steps=10, per_device_train_batch_size=PER_DEVICE_BATCH_SIZE, remove_unused_columns=False, save_steps=1000, save_total_limit=1, report_to="wandb", ) trainer = Trainer( model=model, args=training_args, train_dataset=train_dataset, eval_dataset=eval_dataset, tokenizer=tokenizer, data_collator=lambda x: data_collator(x, tokenizer), ) trainer.train() trainer.evaluate() unwrapped_model = accelerator.unwrap_model(model) unwrapped_model.save_pretrained( f"/mnt/finetunes/{NAME}", is_main_process=accelerator.is_main_process, save_function=accelerator.save, state_dict=accelerator.get_state_dict(model, unwrap=False) ) if accelerator.is_main_process: tokenizer.save_pretrained(f"/mnt/finetunes/{NAME}") ``` `get_data.py` ```python from datasets import load_from_disk, disable_caching disable_caching() IGNORE_TOKEN = -100 ##################### # FORMAT DATA # ##################### template_context = """### Instruction: {instruction} ### Context: {context} ### Response: """ template_no_context = """### Instruction: {instruction} ### Response: """ def data_to_string(data): instruction = data["instruction"] context = data["context"] response = data["response"] template = template_context if len(context) > 0 else template_no_context source = template.format(instruction=instruction, context=context) return { "source": source, "text": source + response, } original_dataset = load_from_disk("../datasets/databricks-dolly-15k")["train"] dataset = original_dataset.map( data_to_string ).remove_columns( original_dataset.column_names ).filter( lambda x: len(x["text"]) < 1000 # TODO: change to 4000 ) ## TEMPORARY truncate data dataset = dataset.select(range(200)) ##################### # SPLIT DATA # ##################### processed_dataset = dataset.train_test_split(test_size=0.1) train_dataset = processed_dataset["train"] eval_dataset = processed_dataset["test"] ##################### # CREATE DATALOADER # ##################### def data_collator(features, tokenizer): sources = [feature["source"] for feature in features] targets = [feature["text"] for feature in features] source_tokens = tokenizer( sources, return_tensors="pt", padding='longest', max_length=None, ) target_tokens = tokenizer( targets, return_tensors="pt", padding='longest', max_length=None, ) labels = target_tokens["input_ids"].clone() for i in range(len(labels)): source_len = source_tokens["attention_mask"][i].sum() labels[i, :source_len] = IGNORE_TOKEN res = { "input_ids": target_tokens["input_ids"], "attention_mask": target_tokens["attention_mask"], "labels": labels, } return res ``` `launch.sh` ```bash export SLURM_JOB_NAME="local-test" accelerate launch --config_file ./accelerate_config.yaml loop.py ``` ### Expected behavior My understanding is that the Transformers Trainer class should work out-of-the-box with Accelerate. In the above example, I try to use Accelerate with FSDP to fine-tune Llama 2. I've tested the fine-tuning without FSDP and it works exactly as expected. However, when I run the code with FSDP (see the above `accelerate_config.yaml` file) and then save the model, it outputs gibberish. I've tried combinations of multiple different strategies: - Calling `model = accelerator.prepare(model)` before passing the model into the Trainer - Unwrapping the model before calling `save_pretrained()` - Using `save_function=accelerator.save` in `save_pretrained()` But none of them seem to fix the problem. Depending on the model architecture, I occasionally get errors like this: - `Some weights of the model checkpoint at /.../llama2-7b-test were not used when initializing LlamaForCausalLM: ['_flat_param']` Thanks so much for the help!

Topic		Replies	Views
They don't remember what you tell them... do they? Beginners	2	257	April 17, 2024
Weak Conversational Skills - dialogPT trained model issue Beginners	4	815	March 12, 2023
No improvement on test data after finetuning Research	1	68	February 13, 2025
Fine tuning a model for a specific task Beginners	2	2401	July 4, 2023
Finetuning of conversational model without train data in conversation style Intermediate	1	1721	February 2, 2024

Fine tuned model gave unreadable responses

Related topics