How to fine-tune merlinite 7B model in Python

I am new to LLM programming in Python and I am trying to fine-tune the instructlab/merlinite-7b-lab model on my Mac M1. My goal is to teach this model to a new music composer Xenobi Amilen I have invented.

The text of this composer is here.

Using the new Ilab CLI from RedHat I created this training set for the model. It is a JSONL file with 100 questions/answers about the invented composer.

I wrote this Python script to train the model. I tested all the parts related to the tokenizer, datasets and it seems to work. However, the final train got this error:

RuntimeError: Placeholder storage has not been allocated on MPS device!
  0%|          | 0/75 [00:00<?, ?it/s]                                                                                                                                        

I found a lot of articles about this error on Google and also StackOverflow like this, for example. The problem seems that in addition to the model I have to send to mps also the input parameters, but it’s not clear to me how to change my code to do that.

I tried several fixes but had no luck. Can anyone can help?

No one in this forum can help?

Hi,

Have you looked at this thread? RuntimeError: Placeholder storage has not been allocated on MPS device!

Yes, I have, but it hasn’t worked for me. As you can see I have the flag enabled.

Yes but further down in the thread they also mention to pass no_cuda=True, could you try that?

Hi @nielsr

I added the option as you suggested but now I get this error:

Traceback (most recent call last):
  File "/Users/sasadangelo/github.com/sasadangelo/llm-train/main.py", line 71, in <module>
    training_args = TrainingArguments(
                    ^^^^^^^^^^^^^^^^^^
  File "<string>", line 129, in __init__
  File "/Users/sasadangelo/github.com/sasadangelo/llm-train/venv/lib/python3.12/site-packages/transformers/training_args.py", line 1693, in __post_init__
    self.device
  File "/Users/sasadangelo/github.com/sasadangelo/llm-train/venv/lib/python3.12/site-packages/transformers/training_args.py", line 2171, in device
    return self._setup_devices
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/sasadangelo/github.com/sasadangelo/llm-train/venv/lib/python3.12/site-packages/transformers/utils/generic.py", line 60, in __get__
    cached = self.fget(obj)
             ^^^^^^^^^^^^^^
  File "/Users/sasadangelo/github.com/sasadangelo/llm-train/venv/lib/python3.12/site-packages/transformers/training_args.py", line 2133, in _setup_devices
    raise ValueError(
ValueError: Either you do not have an MPS-enabled device on this machine or MacOS version is not 12.3+ or current PyTorch install was not built with MPS enabled.

I have MacOS Sonoma 14.5, MPS is enabled because this test works fine:

The only thing I don’t know if PyTorch has MPS enabled but I don’t know how to verify. On MacOS I installed with pip the pythorch version (it is a recent nightly build):

torch==2.5.0.dev20240704

I used also the stable release in my tests but I didn’t try with the option you suggested. What you suggest to do?

To verify MPS Pytorch support I run this test:

the result was:

tensor([1.], device='mps:0')

so I think everythin is OK

FYI, no_cuda has been deprecated and raplaced by use_cpu that by default is already to False. So I don’t think the flag is the issue. I think it is something related to Pytorch.

Hi @sasadangelo - I’m a member of the InstructLab project. I am sorry I missed this post when you initially made it.

I looked at your repo - I don’t see the knowledge YAML or md you would need using the InstructLab methodology to train, do you have those available to take a look at?

I’m also wondering why you used a custom script instead of ilab train to train the model? Were you just interested in the datagen component and not the full workflow?