I am trying to learn how to create a model from scratch - not something already in Huggingface - to put on Hugginface. The simple model I made looks like:
class EmbeddingModel(PreTrainedModel):
def __init__(self, config, loss_fct = nn.CrossEntropyLoss()) -> None:
super().__init__(config)
self.loss_fct = loss_fct
self.embed_dim = config.hidden_size
self.wte = nn.Embedding(config.vocab_size, self.embed_dim)
self.lm_head = nn.Linear(self.embed_dim, config.vocab_size, bias=False)
def forward(self, input_ids, labels = None, attention_mask = None):
_, T = input_ids.shape
token_embeddings = self.wte(input_ids)
x = token_embeddings
lm_logits = self.lm_head(x) # (B, T) -> (B, T, C)
# Get crossentropy loss predicting next token
if labels is not None:
#self.loss_fct = nn.CrossEntropyLoss()
shift_logits = lm_logits[..., :-1, :].contiguous()
shift_labels = labels[..., 1:].contiguous()
loss = self.loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))
# Return in a format Hugginface trainers understand
return CausalLMOutputWithCrossAttentions(
loss=loss,
logits=lm_logits,
)
I can create and save the model as:
run_name = 'my_run'
output_dir = 'my_dir'
config = PretrainedConfig(
name_or_path=run_name,
vocab_size=vocab_size,
hidden_size=768 // 2,
num_hidden_layers=0,
n_ctx=context_length,
bos_token_id=tokenizer.bos_token_id,
eos_token_id=tokenizer.eos_token_id,
is_decoder=True,
model_type='my_model',
)
model = EmbeddingModel(config)
model.save_pretrained(output_dir)
-----
Configuration saved in mydir/config.json
Model weights saved in mydir/model.safetensors
I am just trying to figure out how I read this in again for later. Trying something like:
model2 = EmbeddingModel(config)
model2.from_pretrained(output_dir)
gives the error
'NoneType' object has no attribute 'from_pretrained'
Trying:
model2 = AutoModel.from_pretrained(output_dir)
gives
Unrecognized model in my_dir. Should have a `model_type` key in its config.json
Now I did set this in my config above… but it isn’t showing up in the json file. If I put it in by hand it says it does not recognize the ‘my_model’ architecture.
The funny thing is I can get the trainer to read it back in after training for a while:
trainer.train(resume_from_checkpoint=True)
model2 = trainer.model
But I would like to just load in the model without this trainer hack. How do you load in a pretrained model you have made from scratch?