T5-small trained with small dataset not infering anything

Requirement: train a Hugging face model to extract parts from the user input. These are the relevant parts to extract: ‘entity’, ‘intention’, and ‘attributes’

After interpreting the user input, the model should reply with a JSON object with the following structure:

type Object = {
  entity: string; // mandatory string property
  intention: 'retrieve' | 'create'; // mandatory property that can only be 'retrieve' or 'create'
  attributes?: any; // optional property of any type
};

I trained a t5-small model with a very small dataset (about 50 objects) that looks like so:

{
    "data": [
        {
            "input": "Give me all projects with a duration of 5 days or more",
            "output": "{'intention': 'retrieve', 'entity': 'projects', 'columns': [ 'Name', 'Id', 'Duration']}"
        },
        {
            "input": "List projects",
            "output": "{'intention': 'retrieve', 'entity': 'projects', 'columns': [ 'Name', 'Id']}"
        },
        {
            "input": "Show me tasks",
            "output": "{'intention': 'retrieve', 'entity': 'tasks', 'columns': [ 'Name', 'Id']}"
        },

When I test it by providing an input that looks exactly like the ones in the training dataset (or the evaluation dataset for that matter), for example, “Give me all projects with a duration of 5 days or more”, the prediction is either empty, a series of dashes, repeats the input or translates it to a random language.
This is the relevant bit of code:

from transformers import T5Tokenizer, T5ForConditionalGeneration, Trainer, TrainingArguments
from datasets import load_dataset

tokenizer = T5Tokenizer.from_pretrained("t5-small")
model = T5ForConditionalGeneration.from_pretrained("t5-small")

train_dataset = load_dataset("json", data_files={"train": "train.json"}, field="data")
eval_dataset = load_dataset("json", data_files={"eval": "eval.json"}, field="data")

def preprocess_function(examples):
    input_texts = [f"extract parameters: {inp}" for inp in examples['input']]
    target_texts = examples['output']
    inputs = tokenizer(input_texts, truncation=True, padding="max_length", max_length=512)
    targets = tokenizer(target_texts, truncation=True, padding="max_length", max_length=512)
    inputs["labels"] = targets["input_ids"]

    return inputs

train_dataset = train_dataset.map(preprocess_function, batched=True)
eval_dataset = eval_dataset.map(preprocess_function, batched=True)

from transformers.trainer_utils import IntervalStrategy

training_args = TrainingArguments(
    output_dir='./results',
    evaluation_strategy=IntervalStrategy.EPOCH, # Set evaluation strategy to epoch
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
    save_strategy=IntervalStrategy.EPOCH, # Set save strategy to epoch
    load_best_model_at_end=True, # Load the best model when training ends
)

I know my dataset is too small, but I expected it to return the same output given the same training input, which makes me believe there is something else wrong.