Hello All,
I have been stuck on the following for a few days and I would really appreciate some help on this.
I am currently working on an abstractive summarisation project and I am trying to finetune BART on my custom dataset. I used the finetuning script provided by hugging face as follows:
python run_summarization.py \
--model_name_or_path facebook/bart-base \
--do_train \
--do_eval \
--do_pred \
--train_file {train_path} \
--validation_file {validation_path} \
--test_file {test_path} \
--text_column full_text \
--summary_column summary \
--output_dir training \
--overwrite_output_dir \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--num_train_epochs=3 \
--predict_with_generate \
--save_steps=500 \
--logging_first_step=True \
--logging_steps=500 \
--eval_steps=500
I get promising results from the predictions generated from this training script. I am getting generated summaries of about 100 words each in the generated_predictions.txt outputted by the script. However, when I try to use the checkpoint of the fine-tuned model to generate further predictions I get terrible results:
model = BartForConditionalGeneration.from_pretrained('./training')
tokenizer = BartTokenizer.from_pretrained('./training')
max_input_length = 1024
max_target_length = 128
text = {example text of around 500 words}
model_inputs = tokenizer(text, max_length=max_input_length,
truncation=True, return_tensors='pt')
pred = model.generate(model_inputs['input_ids'])
tokenizer.decode(pred[0])
I get a summary of an incomplete sentence made up of 10-15 words.
I am just really confused as BART has been finetuned and I am getting completely different results relative to predictions made by the training script. My question is how can I use any checkpoint to generate the same results?