Inconsistent Model/Pipeline Behavior using Automodel/Pipeline/BartForConditionalGeneration

Iā€™m using code 99% provided by huggingface, which is the main source of confusion. I am attempting summarization of medical scientific documents. I am on transformers version 4.2.0

My code comes from 3 locations, and for the most part, is unmodified.
https://huggingface.co/transformers/model_doc/bart.html#bartforconditionalgeneration

model = BartForConditionalGeneration.from_pretrained('facebook/bart-large')
tokenizer = BartTokenizer.from_pretrained('facebook/bart-large')
inputs = tokenizer([text], max_length=1024, return_tensors='pt')
# Generate Summary
summary_ids = model.generate(inputs['input_ids'], num_beams=4, max_length=150, min_length = 40, early_stopping=True)
print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids])

https://huggingface.co/transformers/task_summary.html

#model = AutoModelWithLMHead.from_pretrained("")
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/bart-large")
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large")
# T5 uses a max_length of 512 so we cut the article to 512 tokens.
inputs = tokenizer.encode(abstract, return_tensors="pt", max_length=512)
outputs = model.generate(inputs, max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True)
print(tokenizer.decode(outputs[0]))

The main significance here is that I changed the LMHead to Seq2SeqLM, as recommended by the warning when I run it.

#Pipelines ā€” transformers 4.3.0 documentation

This is the current third method Iā€™m using to run the code.

summarizer = pipeline("summarization", model="facebook/bart-large", tokenizer="facebook/bart-large", framework="pt")
summary = summarizer(text, min_length=40, max_length=150, length_penalty = 2.0, num_beams = 4, early_stopping = True)
print(summary)

Iā€™ll summarize some results below.
1 is pipeline, 2 is AutoModel, 3 is BartForConditional

When using facebook/bart-large, 1 & 3 ā€¦ will give the same results. However, the second one, AutoModel, gives different results, despite documentation indicating (to me) that AutoModelForSeq2SeqLM should, in this case, be identical to the BArtForConditionalGeneration.

Results get stranger when using facebook/bart-large-xsum, which gives the same results for 2/3ā€¦ While 1 actually comes back with a result thatā€™s nowhere to be found in the original text.

Using facebook/bart-large-cnn, all 3 results are the same.

I havenā€™t tested more than this. I donā€™t know if this is just major user error, or something for GitHub.
Please let me know.
Input text is from a medical abstract, located below.

text = "Oesophageal squamous cell carcinoma (ESCC) is an aggressive malignancy and a leading cause of cancer-related death worldwide. Lack of effective early diagnosis strategies and ensuing complications from tumour metastasis account for the majority of ESCC death. Thus, identification of key molecular targets involved in ESCC carcinogenesis and progression is crucial for ESCC prognosis. In this study, four pairs of ESCC tissues were used for mRNA sequencing to determine differentially expressed genes (DEGs). 347 genes were found to be upregulated whereas 255 genes downregulated. By screening DEGs plus bioinformatics analyses such as KEGG, PPI and IPA, we found that there were independent interactions between KRT family members. KRT17 upregulation was confirmed in ESCC and its relationship with clinicopathological features were analysed. KRT17 was significantly associated with ESCC histological grade, lymph node and distant metastasis, TNM stage and five-year survival rate. Upregulation of KRT17 promoted ESCC cell growth, migration, and lung metastasis. Mechanistically, we found that KRT17-promoted ESCC cell growth and migration was accompanied by activation of AKT signalling and induction of EMT. These findings suggested that KRT17 is significantly related to malignant progression and poor prognosis of ESCC patients, and it may serve as a new biological target for ESCC therapy. SIGNIFICANCE: Oesophageal cancer is one of the leading causes of cancer mortality worldwide and oesophageal squamous cell carcinoma (ESCC) is the major histological type of oesophageal cancer in Eastern Asia. However, the molecular basis for the development and progression of ESCC remains largely unknown. In this study, RNA sequencing was used to establish the whole-transcriptome profile in ESCC tissues versus the adjacent non-cancer tissues and the results were bioinformatically analysed to predict the roles of the identified differentially expressed genes. We found that upregulation of KRT17 was significantly associated with advanced clinical stage, lymph node and distant metastasis, TNM stage and poor clinical outcome. Keratin 17 (KRT17) upregulation in ESCC cells not only promoted cell proliferation but also increased invasion and metastasis accompanied with AKT activation and epithelial-mesenchymal transition (EMT). These data suggested that KRT17 played an important role in ESCC development and progression and may serve as a prognostic biomarker and therapeutic target in ESCC. "

EDIT1: I originally forgot ā€œlength_penalty = 2.0ā€ in the BartConditinal/3. However, this had no effect on anything.

You are using a different tokenization for both examples, and in particular, the max_length is different. Maybe thatā€™s the reason?

I donā€™t believe that this is the issue, because when I print out the encoded inputs (print(inputs)), the numbers are the same, and the max length is far below the 512.
That said, let me test this and get back on it!

EDIT1: Tokenization length made a difference, and was the cause of differences while using bart-large.
EDIT2: However, XSum is still hallucinating strange results. CNN has consistent behavior.

So this does confirm that the automodel works exactly as the bartconditional, and the difference there was only user error.

I guess the remaining question is justā€¦ does anybody have insight into how pipeline actually works? I have no idea what itā€™s doing behind the scenes, or why itā€™s giving me different results here with XSUM.

So Iā€™ve dug through this, but still donā€™t have a better idea.
Tokenization is defaulting to length of 520. I printed out what the acutal tokenizer IS.

PreTrainedTokenizerFast(name_or_path=ā€˜facebook/bart-large-xsumā€™, vocab_size=50265, model_max_len=1024, is_fast=True, padding_side=ā€˜rightā€™, special_tokens={ā€˜bos_tokenā€™: ā€˜ā€™, ā€˜eos_tokenā€™: ā€˜ā€™, ā€˜unk_tokenā€™: ā€˜ā€™, ā€˜sep_tokenā€™: ā€˜ā€™, ā€˜pad_tokenā€™: ā€˜ā€™, ā€˜cls_tokenā€™: ā€˜ā€™, ā€˜mask_tokenā€™: AddedToken("", rstrip=False, lstrip=True, single_word=False, normalized=False)})

The tokenized RESULTS are different from the BartConditionalā€¦ So not sure overall whatā€™s happening.