Decicoder finetune error: understanding naive_attention_prefill

I’m attempting to fine-tune a Decicoder model on my own data and encountering an error.
My issue is that I’m not able to find documentation on the naive_attention_prefill attribute that the error refers to and so don’t actually understand how to fix.

import datasets
from transformers import AutoModelForCausalLM, AutoTokenizer, DataCollatorWithPadding, TrainingArguments, Trainer, AutoConfig

checkpoint = "Deci/DeciCoder-1b" 
device = "cpu"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(checkpoint, trust_remote_code=True).to(device)

# Using dummy data since just trying to get the architecture to work
train_list = [{'text': """;lkj;kjlkj;lkj;kjasdfasdfasdfadsf"""},
            {'text': """asdfasdfafds;lj;lkj;lkj;salkjfd;akdsjf;asdf"""}]

test_list = [{'text': "asdfasdfadsfasdfasdfasdfasdfasdfadsf"},
            {'text': """etrpoiqetrpqetqpoiewr.nn,n.,n.n.asdfasdqewr"""}]

dataset_prepped = datasets.DatasetDict({
    'train': datasets.Dataset.from_list(train_list),
    'test': datasets.Dataset.from_list(test_list),
    }
)

def tokenize_function(example):
    return tokenizer(example['text'], padding="max_length", truncation=True)

tokenized_dataset = dataset_prepped.map(tokenize_function, batched=True)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
training_args = TrainingArguments("test-trainer")

trainer = Trainer(
    model,
    training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
    data_collator=data_collator,
    tokenizer=tokenizer
)

trainer.train()

Error:

ValueError: For support of custom attention masks please set naive_attention_prefill to True in the config

I did attempt to do as the suggestion

config = AutoConfig.from_pretrained(checkpoint, naive_attention_prefill = True)

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(checkpoint, trust_remote_code=True, config = config).to(device)

However, get the following error.

ValueError: The model class you are passing has a `config_class` attribute that is not consistent with the config class you passed (model has  and you passed . Fix one of those so they match!

It seems that the original model wasn’t built with that naive_attention_prefill set to True. So I would like to resolve this, but also just understand where the documentation for this attribute is - I haven’t found it anywhere.

1 Like

try doing naive_attention_prefil=True,

instead of naive_attention_prefill = True