BART question, it seems that pretraining is not work for a small model?

guotong1988 · July 29, 2020, 8:15am

What is your question?

My task is to generate keywords from sentences.

I pretrain a text-generation model. I mask the sentences’ tokens and predict the whole sentences’ tokens.

Pretraining batch_size = 8 and step = 1000000

I haven’t observed improvement from pretraining. BLEU score is 10.5 for not pretraining, BLEU score is 9.5 for pretraining.

Code

I take the python code from

github.com

google-research/pegasus/blob/master/pegasus/models/transformer.py#L38


from pegasus.layers import attention
from pegasus.layers import decoding
from pegasus.layers import embedding
from pegasus.layers import timing
from pegasus.layers import transformer_block
from pegasus.models import base
import tensorflow as tf
from tensorflow.contrib import layers as contrib_layers


class TransformerEncoderDecoderModel(base.BaseModel):
  """Transformer encoder+decoder.

  Notations:
    B: batch_size, I: max_input_len, T: max_target/decode_len, D: hidden_size
    V: vocab_size
  """

  def __init__(self, vocab_size, hidden_size, filter_size, num_heads,
               num_encoder_layers, num_decoder_layers, label_smoothing,
               dropout):

hidden_size = 512
num_encoder_layers = 3
num_decoder_layers = 3

Discussion

The task is to generate keyword from sentences.
The keyword may not appear in the sentences.
So input masked sentences to predict whole sentences, it is not benefit the keywords generation task.
Input masked sentences to predict whole sentences, it do not have relation to the keywords generation task.
Am I right? Is it the reason that pretraining do not improve the BLEU score?

Thank you very much.

BramVanroy · July 29, 2020, 8:49am

With all due respect, you are asking a question on a forum dedicated to a specific library transformers by HuggingFace, but the question does not involve that library. In fact, you are using a completely different library. I am not sure if this is the right place for such questions. @sgugger

guotong1988 · July 29, 2020, 9:06am

I have changed the tag.

sgugger · July 29, 2020, 1:25pm

On the research part of the forum, we welcome any general questions, though of course we would prefer you to use our models
@sshleifer might have some answer as he is the Bart person on the team.

sshleifer · July 29, 2020, 2:21pm

Definitely possible, there could also be a bug in your code. I don’t have enough familiarity with your task to know what results to expect.

guotong1988 · July 30, 2020, 1:41am

Thank you. I am also using your models.

guotong1988 · August 3, 2020, 3:10am

1, I pad some zeros in the input tokens for multi sentences. The output positions of output tokens should be exactly same to the input tokens, which means I should keep the padding zeros in the output tokens.

2, The pretraining time should be longer.

Topic		Replies	Views
Pre-train PEGASUS model from scratch Models	7	2824	April 25, 2021
Gap Sentences Generation using Pegasus Beginners	1	377	March 6, 2024
Compute the BLEU using pretrained T5-small Models	2	3981	April 13, 2022
Fine-tune T5-small but lower performance Models	0	1407	April 21, 2022
Simple Model to rewrite/paraphrase Beginners	7	266	March 19, 2025

BART question, it seems that pretraining is not work for a small model?

What is your question?

Code

Discussion

Related topics