I am starting (before fine tuning) with:
class Exp(nn.Module):
def __init__(self,config):
super(Exp,self).__init__()
self.bart = BartForConditionalGeneration.from_pretrained(config)
def forward(
self,
input_ids=None,
attention_mask=None,
decoder_input_ids=None,
decoder_attention_mask=None,
head_mask=None,
decoder_head_mask=None,
encoder_outputs=None,
past_key_values=None,
inputs_embeds=None,
decoder_inputs_embeds=None,
labels=None,
use_cache=None,
output_attentions=None,
output_hidden_states=None,
return_dict=None,
):
output = self.bart(input_ids=input_ids,attention_mask=attention_mask,
labels=labels,decoder_input_ids=decoder_input_ids,encoder_outputs=encoder_outputs)
return output
Loading the pre-trained model before fine-tuning :
model = BartExp(‘facebook/bart-base’)
Since the model is being trained by multiple GPUs , I transform model = torch.nn.DataParallel(model)
and then my saving method is
model_to_save = model.module if hasattr(
model, 'module') else model # Only save the model it-self
model_to_save.bart.save_pretrained(args.output_dir)
tokenizer.save_vocabulary(args.output_dir)
So, when loading the trained model , I am doing the same procedure :
model = BartForQaSimplification(args.output_dir)
where output_dir is the path to my pytorch_model.bin file. I am loading the fine-tuned model, not initiating the pre-trained model again , right ?