How to fine-tune merlinite 7B model in Python

sasadangelo · July 1, 2024, 6:02am

I am new to LLM programming in Python and I am trying to fine-tune the instructlab/merlinite-7b-lab model on my Mac M1. My goal is to teach this model to a new music composer Xenobi Amilen I have invented.

The text of this composer is here.

Using the new Ilab CLI from RedHat I created this training set for the model. It is a JSONL file with 100 questions/answers about the invented composer.

I wrote this Python script to train the model. I tested all the parts related to the tokenizer, datasets and it seems to work. However, the final train got this error:

RuntimeError: Placeholder storage has not been allocated on MPS device!
  0%|          | 0/75 [00:00<?, ?it/s]

I found a lot of articles about this error on Google and also StackOverflow like this, for example. The problem seems that in addition to the model I have to send to mps also the input parameters, but it’s not clear to me how to change my code to do that.

I tried several fixes but had no luck. Can anyone can help?

sasadangelo · July 3, 2024, 6:22pm

No one in this forum can help?

nielsr · July 8, 2024, 8:27am

Hi,

Have you looked at this thread? RuntimeError: Placeholder storage has not been allocated on MPS device!

sasadangelo · July 8, 2024, 9:44am

Yes, I have, but it hasn’t worked for me. As you can see I have the flag enabled.

github.com

sasadangelo/llm-train/blob/017385be56e6b643ba66352aed17e0b12629536b/main.py#L82


      
              output_dir="./results",           # Directory di output per i modelli e i log
              eval_strategy="epoch",            # Strategia di valutazione (ad ogni epoca)
              learning_rate=2e-5,               # Tasso di apprendimento
              per_device_train_batch_size=4,    # Batch size per dispositivo durante l'addestramento
              per_device_eval_batch_size=4,     # Batch size per dispositivo durante la valutazione
              num_train_epochs=3,               # Numero di epoche di addestramento
              weight_decay=0.01,                # Decadimento del peso
              save_total_limit=2,               # Numero massimo di checkpoint da conservare
              save_steps=10_000,                # Frequenza di salvataggio dei checkpoint
              logging_dir='./logs',             # Directory per i log
              use_mps_device=True,
          )
          
          # Initialize the DataCollator
          data_collator = DataCollatorForLanguageModeling(tokenizer, mlm=False)
          
          # Initialize the SFTTrainer
          trainer = Trainer(
              model=model,
              args=training_args,
              train_dataset=train_dataset,

nielsr · July 8, 2024, 12:33pm

Yes but further down in the thread they also mention to pass no_cuda=True, could you try that?

sasadangelo · July 9, 2024, 5:12pm

Hi @nielsr

I added the option as you suggested but now I get this error:

Traceback (most recent call last):
  File "/Users/sasadangelo/github.com/sasadangelo/llm-train/main.py", line 71, in <module>
    training_args = TrainingArguments(
                    ^^^^^^^^^^^^^^^^^^
  File "<string>", line 129, in __init__
  File "/Users/sasadangelo/github.com/sasadangelo/llm-train/venv/lib/python3.12/site-packages/transformers/training_args.py", line 1693, in __post_init__
    self.device
  File "/Users/sasadangelo/github.com/sasadangelo/llm-train/venv/lib/python3.12/site-packages/transformers/training_args.py", line 2171, in device
    return self._setup_devices
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/sasadangelo/github.com/sasadangelo/llm-train/venv/lib/python3.12/site-packages/transformers/utils/generic.py", line 60, in __get__
    cached = self.fget(obj)
             ^^^^^^^^^^^^^^
  File "/Users/sasadangelo/github.com/sasadangelo/llm-train/venv/lib/python3.12/site-packages/transformers/training_args.py", line 2133, in _setup_devices
    raise ValueError(
ValueError: Either you do not have an MPS-enabled device on this machine or MacOS version is not 12.3+ or current PyTorch install was not built with MPS enabled.

I have MacOS Sonoma 14.5, MPS is enabled because this test works fine:

github.com

sasadangelo/llm-train/blob/017385be56e6b643ba66352aed17e0b12629536b/main.py#L31


      
          
          # Function to move the batch to the specified device
          #def collate_fn(batch, device):
          #    for k, v in batch.items():
          #        batch[k] = v.to(device)
          #    return batch
          
          model_name="instructlab/merlinite-7b-lab"
          # Verifica la disponibilità della GPU MPS
          device = "cpu"
          if torch.backends.mps.is_available() and torch.backends.mps.is_built():
              device = "mps"
          torch_device = torch.device(device)
          print(f"* Using device: {device}")
          
          print(f"* Set the default device for PyTorch to {torch_device}")
          torch.set_default_device(torch_device)
          
          # Prepare the tokenizer for the input model. AutoTokenizer will load the correct tokenizer for the input model.
          print(f"* Load the tokenizer for {model_name}.")
          tokenizer = AutoTokenizer.from_pretrained(model_name)

The only thing I don’t know if PyTorch has MPS enabled but I don’t know how to verify. On MacOS I installed with pip the pythorch version (it is a recent nightly build):

torch==2.5.0.dev20240704

I used also the stable release in my tests but I didn’t try with the option you suggested. What you suggest to do?

sasadangelo · July 9, 2024, 5:14pm

To verify MPS Pytorch support I run this test:

the result was:

tensor([1.], device='mps:0')

so I think everythin is OK

sasadangelo · July 9, 2024, 5:23pm

FYI, no_cuda has been deprecated and raplaced by use_cpu that by default is already to False. So I don’t think the flag is the issue. I think it is something related to Pytorch.

mairindubh · July 21, 2024, 2:48am

Hi @sasadangelo - I’m a member of the InstructLab project. I am sorry I missed this post when you initially made it.

I looked at your repo - I don’t see the knowledge YAML or md you would need using the InstructLab methodology to train, do you have those available to take a look at?

I’m also wondering why you used a custom script instead of ilab train to train the model? Were you just interested in the datagen component and not the full workflow?

sasadangelo · October 9, 2024, 7:23am

Hi mairindubh

The point is that I don’t want to use instructlab to train the model and his methodology with skill and knowledge. I want to understand how training works at a lower level. This is the reason why I implemented my script.

Topic		Replies	Views
RuntimeError: Placeholder storage has not been allocated on MPS device! Beginners	21	19151	July 30, 2024
Cannot train train transformer on Mac/MPS Beginners	0	294	June 1, 2024
Using pipelin on pytorch mps device Beginners	0	1960	July 4, 2022
AutoTrain Python automatically using MPS on Mac - How to switch to CPU Beginners	1	520	November 9, 2024
SftTrainer and mps (validation loss nan) 🤗Transformers	0	348	March 26, 2024

How to fine-tune merlinite 7B model in Python

Related topics