I realize that the issue was that I wasn’t using beam search decoding, or max length while generating. The code should be:
from transformers import pipeline # type: ignore
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer # type: ignore
import torch
checkpoint = "sshleifer/distilbart-cnn-12-6"
revision = "a4f8f3e"
summarizer_input = """
John Jeremy Thorpe (29 April 1929 – 4 December 2014) was a British politician who served as the Member of Parliament for North Devon from 1959 to 1979, and as leader of the Liberal Party from 1967 to 1976. In May 1979, he was tried at the Old Bailey on charges of conspiracy and incitement to murder his ex-boyfriend Norman Scott, a former model. Thorpe was acquitted on all charges, but the case, and the furore surrounding it, ended his political career.
Thorpe was the son and grandson of Conservative MPs, but decided to align with the small and ailing Liberal Party. After reading Law at Oxford University he became one of the Liberals' brightest stars in the 1950s. He entered Parliament at the age of 30, rapidly made his mark, and was elected party leader in 1967. After an uncertain start during which the party lost ground, Thorpe capitalised on the growing unpopularity of the Conservative and Labour parties to lead the Liberals through a period of electoral success. This culminated in the general election of February 1974, when the party won 6 million votes out of some 31 million cast. Under the first-past-the-post electoral system this gave them only 14 seats, but in a hung parliament, no party having an overall majority, Thorpe was in a strong position. He was offered a cabinet post by the Conservative prime minister, Edward Heath, if he would bring the Liberals into a coalition. His price for such a deal, reform of the electoral system, was rejected by Heath, who resigned in favour of a minority Labour government.
"""
summarizer = pipeline("summarization", model=checkpoint, revision=revision, device_map="cuda")
result = summarizer(summarizer_input, min_length=4*10, max_length=4*15)[0]['summary_text']
print(result)
summarizer = AutoModelForSeq2SeqLM.from_pretrained(pretrained_model_name_or_path=checkpoint, revision=revision, device_map="cuda")
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
# Tokenize, then run thru model, then argmax to get vocab terms, then decode
inputs = tokenizer(text_target=summarizer_input, max_length=1024, truncation=True, padding=True, return_tensors="pt")
inputs = inputs.to("cuda")
outputs = summarizer.generate(inputs["input_ids"], num_beams=4, min_length=4*10, max_length=4*15)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0])