'Write like Yoda' - Best model for implicitly learning style changes to paired sentences

A Star Wars example, to help explain my goal.

Imagine trying to train a transformer model to talk like Yoda. We all know what that means, like where Yoda inverts the sentence structure. But how would we train a model to do that, given sentence pairings (regular and Yoda), but NOT knowing this is Yoda who was saying the second column in the pairing. The model has to come up with the ‘text change rules.’

…so in my case…

I’ve been using t5-small, but I want to improve.

Here is a non-Yoda situation. Column A is list of original sentences. Column B is a list of stylistically changed versions of those sentences.

Example:
column_actual
My name is George.
I love cheese.
I am cool.

column_changed
George, that be me. I am that.
Cheese is my biggest love cheese cheese cheese
How cool is me? Cool is that.

… so as you can see the topics are all the same, and the language is a bit like each other, but there is a kind of style change. I have a dataset like this I’m working on, but the below code does not seem to get me where I want to be.

# Split the dataset into training and validation
train_test_split = dataset.train_test_split(test_size=0.1)  # 10% for validation
dataset_dict = DatasetDict({
    'train': train_test_split['train'],
    'validation': train_test_split['test']
})

# Tokenization
model_name = "t5-small"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

# Set the model to use CPU
device = torch.device("cpu")
model.to(device)

def tokenize_function(examples):
    # Encode inputs and labels using tokenizer
    encoding = tokenizer(
        ["imitate style, tone, and capitalization: " + text for text in examples['actual']],
        max_length=128,
        truncation=True,
        padding="max_length",
        return_tensors="pt"
    )
    labels = tokenizer(
        examples['sensational'],
        max_length=128,
        truncation=True,
        padding="max_length",
        return_tensors="pt"
    ).input_ids

    # Replace padding token IDs in labels with -100 to ignore them in the loss calculation
    labels[labels == tokenizer.pad_token_id] = -100
    encoding['labels'] = labels
    return encoding

tokenized_datasets = dataset_dict.map(tokenize_function, batched=True, remove_columns=dataset_dict['train'].column_names)

training_args = TrainingArguments(
    output_dir="./results_imitate",
    evaluation_strategy="epoch",
    learning_rate=3e-5,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    num_train_epochs=10,
    save_strategy="steps",  # Save every 30 steps
    save_steps=500,
    weight_decay=0.01,
    logging_dir='./logs',
    logging_steps=10,
)


# Initialize Trainer with the callback
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets['train'],
    eval_dataset=tokenized_datasets['validation'],
    tokenizer=tokenizer,
    callbacks=[SaveModelCallback()]  # Add the callback here
)
# Train the model
trainer.train()

# Function to generate sensationalized text
def sensationalize_text(text):
    inputs = tokenizer("imitate style, tone, and capitalization: " + text, return_tensors="pt", max_length=128, truncation=True)
    inputs = inputs.to(device)
    outputs = model.generate(inputs.input_ids, max_length=50, num_beams=5, early_stopping=True)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

so that is the way I’m looking at it

Questions

  1. is there a better model to use for this than t5-small
  2. can anyone point me to instances where this might have worked for others? b/c i think they way I am telling the t5 tokenizer to ‘imitate style, tone, and capitalization’ of the original text may be incorrect

Thank you for any help, or advice you might have!
The Lord Always Delivers!