I’m fine tuning the Falcon-40B model on a single node multi-GPU. The training process finished without an error. I can also load the model using AutoModelForCausalLM.from_pretrained. However, the model is giving unreadable words.
Here is the printout
Use the context below to answer the user’s question.
Context: Much of the time, when we get training about communication, we are told how to say things. What I want to focus on in this segment is actually more on your listening skills-- how you make yourself available and open and receptive to hear what others have to say. And one of the things that makes effective listening challenging is that we often are half-listening, and the other half is already thinking about what I’m going to say, because I should say something in return. That, I’m going to challenge for you. Why is effective listening so important in relationships? Relationships are both. It is the talking, but it is even more so the listening. And it is the listener and the quality of our listening that will actually shape what the speaker will say and how they will say it. We think that the other person just said this because that’s what they say, but no. What I’m saying to you is influenced by how I experience your listening to what I’m saying. And your listening to what I’m saying is shaping what I’m going to say next. So listening is anything but passive. It is actually very active and very powerful in shaping the conversation, the communication, and thus the relationship.
Another observation is that, when load the model, it printed this warning:
Some weights of the model checkpoint at ./cv_botfalcon_40b_base_rag_friendly_multigpu_w_group_texts_block_size_512_per_device_batch_size_2_v_ak were not used when initializing FalconForCausalLM: [‘_flat_param’]
This IS expected if you are initializing FalconForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing FalconForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of FalconForCausalLM were not initialized from the model checkpoint at ./cv_botfalcon_40b_base_rag_friendly_multigpu_w_group_texts_block_size_512_per_device_batch_size_2_v_ak and are newly initialized: [‘transformer.ln_f.bias’, ‘lm_head.weight’, ‘transformer.word_embeddings.weight’, ‘transformer.ln_f.weight’]
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
When load the tokenizer, it prints out:
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
FSDP model saving logic was changed slightly around accelerate 0.22. If you check the index file where you model is saved, you will find some layers missing.