The following columns in the training set don't have a corresponding argument

Hello,

I am trying to fine tune a T5 model using xsum dataset but I am getting the following. Should I just change the sames “summary, document and id” to something else? Or that’s due to something else?

The following columns in the training set don’t have a corresponding argument in T5ForConditionalGeneration.forward and have been ignored: summary, document, id.

Many thanks!

1 Like

Hi,

The Trainer automatically ignores columns in your dataset which aren’t used by the model. For T5 for instance, the model expects input_ids, attention_mask, labels etc., but not “summary”, “document”, “id”. As long as input_ids etc are in your dataset, it’s fine.

The warning is just telling you that those columns aren’t used.

2 Likes


No it not working for please help!
import pandas as pd

Load the original dataset

data = load_dataset(“jcordon5/cybersecurity-rules”)

Convert the training dataset to a pandas DataFrame

df = pd.DataFrame(data[“train”])

Create a new Dataset from the DataFrame, ensuring only the required columns

df = df[[‘instruction’, ‘output’]] # Keep only ‘instruction’ and ‘output’

Convert the DataFrame back into a Dataset

train_dataset = Dataset.from_pandas(df)

Create a DatasetDict

data = DatasetDict({

‘train’: train_dataset

})

Verify the structure of the new dataset dictionary

data

data = data.map(
lambda samples: tokenizer(
samples[“instruction”], padding=“max_length”,
truncation=True
),
batched=True
)
data

Formatting function (ensure full output is included)

def formatting_func(example):
text = f"Output: {example[‘output’]}" # Return entire output
return [text]

1 Like