Finetuning 4bit model

Gustaff · August 28, 2023, 10:43pm

Hi, im triying to finetune an llama-7B repo, a quantized 4bit version, but when the training starts suddenly drops to 0, the model im triying que finetune is this: “LinkSoul/Chinese-Llama-2-7b-4bit” This its my training scipt:

import pandas as pd
import torch
import wandb
from torch.utils.data import Dataset
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments, DataCollatorForLanguageModeling

wandb.init(project=“mi-proyecto-alpaca”)

class ParquetDataset(Dataset):
def init(self, tokenizer, file_paths, block_size):
self.tokenizer = tokenizer
self.block_size = block_size
self.inputs =
self.targets =

    for file_path in file_paths:
        try:
            df = pd.read_parquet(file_path)
        except Exception as e:
            print(f"Error al leer el archivo {file_path}: {e}")
            continue

        self.inputs.extend(df['instruction'].tolist())
        self.targets.extend(df['output'].tolist())

    # Tokenizar las entradas y las salidas
    self.tokenized_inputs = tokenizer(self.inputs, truncation=True, padding='max_length', max_length=block_size)
    self.tokenized_targets = tokenizer(self.targets, truncation=True, padding='max_length', max_length=block_size)

def __len__(self):
    return len(self.inputs)

def __getitem__(self, i):
    return {'input_ids': torch.tensor(self.tokenized_inputs['input_ids'][i]).long(),
            'attention_mask': torch.tensor(self.tokenized_inputs['attention_mask'][i]).long(),
            'labels': torch.tensor(self.tokenized_targets['input_ids'][i]).long()}

from transformers import GPTQConfig # Importing GPTQConfig for quantization

def main():
model_name_or_path = “./alpacashort”
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)

# Setting up GPTQConfig for 4-bit quantization
gptq_config = GPTQConfig(bits=4, disable_exllama=True)  # Add other parameters as needed

model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path, 
    load_in_4bit=True,
    device_map="auto",  # Setting device_map to "auto"
    quantization_config=gptq_config  # Adding quantization config
)

train_file_paths = ["./alpacadataset/train-00000-of-00001-6ef3991c06080e14.parquet"]
train_dataset = ParquetDataset(tokenizer, train_file_paths, block_size=128)

data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)


training_args = TrainingArguments(
    output_dir="./output",
    overwrite_output_dir=True,
    num_train_epochs=1,
    per_device_train_batch_size=32,  # Specify batch size
    save_steps=10_000,
    save_total_limit=2,
    pad_token_id=tokenizer.pad_token_id  # Specify pad_token_id
)

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=train_dataset,
)

trainer.train()

if name == “main”:
main()

And this is the training output:

You are using the default legacy behaviour of the <class ‘transformers.models.llama.tokenization_llama.LlamaTokenizer’>. If you see this, DO NOT PANIC! This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=True.
This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in ⚠️⚠️[`T5Tokenize`] Fix T5 family tokenizers⚠️⚠️ by ArthurZucker · Pull Request #24565 · huggingface/transformers · GitHub
Loading checkpoint shards: 100%|██████████████████| 2/2[00:45<00:00, 22.84s/it]
0%| | 0/4334 [00:00<?, ?it/s]
You’re using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the __call__ method is faster than using a method to encode the text followed by a call to the pad method to get a padded encoding.
/home/gus/.local/lib/python3.8/site-packages/bitsandbytes/nn/modules.py:224: UserWarning: Input type into Linear4bit is torch.float16, but bnb_4bit_compute_type=torch.float32 (default). This will lead to slow inference or training speed.
warnings.warn(f’Input type into Linear4bit is torch.float16, but bnb_4bit_compute_type=torch.float32 (default). This will lead to slow inference or training speed.')

{‘loss’: 1.7572, ‘learning_rate’: 2.0000000000000003e-06, ‘epoch’: 0.0}
{‘loss’: 0.0, ‘learning_rate’: 4.000000000000001e-06, ‘epoch’: 0.0}
{‘loss’: 0.0, ‘learning_rate’: 6e-06, ‘epoch’: 0.01}
{‘loss’: 0.0, ‘learning_rate’: 8.000000000000001e-06, ‘epoch’: 0.01}
{‘loss’: 0.0, ‘learning_rate’: 1e-05, ‘epoch’: 0.01}
{‘loss’: 0.0, ‘learning_rate’: 1.2e-05, ‘epoch’: 0.01}
{‘loss’: 0.0, ‘learning_rate’: 1.4000000000000001e-05, ‘epoch’: 0.02}

Any clue about what im doing wrong?

Ayushnangia · August 29, 2023, 6:02pm

Hey do you mind sharing your notebook?

Topic		Replies	Views
Why my finetuned model size so small and unable to load Beginners	0	115	July 9, 2024
Error When Trying to Finetune Llama 2 Chat 13B Beginners	0	473	October 2, 2023
[SOLVED] Trying to fine-tune Llama, getting NaN gradients after a single step Models	1	1007	August 23, 2024
Fine tune a finetuned model Beginners	1	566	December 16, 2024
Simple use of Transformers breaks Beginners	1	1384	June 2, 2023

Finetuning 4bit model

And this is the training output:

Related topics