Bart-base rouge scores

sshleifer · August 11, 2020, 7:00pm

Has anyone finetuned bart-base on xsum or cnn summarization task and willing to report the rouge score they got?
I just got 15.5 for xum which feels low, since bart-large can get to 22 ish.

@astariul @valhalla @VictorSanh ?

valhalla · August 12, 2020, 5:44am

@sshleifer, could it be due to the adjust_logits issue ? Just a guess but as I posted there, after modifying the adjust_logits_during_generation BLUE-4 score for my model went from 13.09 to 19.14 for bart-base

valhalla · August 14, 2020, 2:34pm

@sshleifer could you also try using bos as decoder_start_token_id and modifying adjust_logits_during_generation to return logits as is instead of forcing bos ? If you also get bump in ROUGE score we can confirm the issue. Thanks !

chrisdoyleIE · August 15, 2020, 6:00pm

Possible suggestion that saves on the re-training could be to check the perplexity values and compare to paper

sshleifer · August 25, 2020, 3:46pm

I got 16.6 ROUGE 2 on XSUM, in 3 epochs/ 6hrs

valhalla · August 25, 2020, 4:24pm

bart-base doesn’t seem to be good then, in my other seq2seq experiment t5-small performed similar/better to bart-base

sshleifer · September 5, 2020, 6:22pm

Made a google doc to aggregate experiment results. Please add any interesting results!

guoziyuan · October 27, 2020, 12:08pm

How can I change the adjust_logits_during_generation ? thanks

sshleifer · October 27, 2020, 2:33pm

By editing the code!

guoziyuan · October 27, 2020, 3:26pm

Can you provide a example ? I saw the source code of adjust_logits_during_generation and it directly returns the logits.

sshleifer · October 27, 2020, 4:32pm

github.com

huggingface/transformers/blob/master/src/transformers/modeling_bart.py#L1100


):
    return {
        "input_ids": None,  # encoder_outputs is defined. input_ids not needed
        "encoder_outputs": encoder_outputs,
        "past_key_values": past,
        "decoder_input_ids": decoder_input_ids,
        "attention_mask": attention_mask,
        "use_cache": use_cache,  # change this to avoid caching (presumably for debugging)
    }

def adjust_logits_during_generation(self, logits, cur_len, max_length):
    if cur_len == 1 and self.config.force_bos_token_to_be_generated:
        self._force_token_id_to_be_generated(logits, self.config.bos_token_id)
    elif cur_len == max_length - 1 and self.config.eos_token_id is not None:
        self._force_token_id_to_be_generated(logits, self.config.eos_token_id)
    return logits

@staticmethod
def _force_token_id_to_be_generated(scores, token_id) -> None:
    """force one of token_ids to be generated by setting prob of all other tokens to 0 (logprob=-float("inf"))"""
    scores[:, [x for x in range(scores.shape[1]) if x != token_id]] = -float("inf")

in the future git grep adjust_logits_during_generation

guoziyuan · October 27, 2020, 4:56pm

thanks

Topic		Replies	Views
Not able to reproduce the XSum rouge score with BART large model Models	0	330	January 22, 2022
BART XSum Finetuning - Loss Dropping Rapidly but Rouge F1 Decreasing to 0 Models	0	835	January 3, 2022
Failed to train bart-cnn from bart-base using my own code Beginners	3	518	October 8, 2020
Cannot reproduce the results Beginners	5	883	October 5, 2020
Facebook/bart-large-cnn has a low rouge score on cnn_dailymail Beginners	14	3226	October 5, 2020

Bart-base rouge scores

Related topics