RepositoryNotFoundError: 404 Client Error

Hi everyone, I am new to NLP and working with HuggingFace. I am working on a text summarization project and trying to fine tune the model. Below is the code I wrote but I am getting the error which I am not able to solve. Any leads would be appreciated.

from huggingface_hub import notebook_login

notebook_login()

I passed here my huggingface token…

from transformers import Seq2SeqTrainingArguments

batch_size = 8
num_train_epochs = 8
# Show the training loss with every epoch
logging_steps = len(tokenized_datasets["article"]) // batch_size
model_name = model_checkpoint.split("/")[-1]

args = Seq2SeqTrainingArguments(
    output_dir="https://huggingface.co/username/mT5",
    evaluation_strategy="epoch",
    learning_rate=5.6e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    weight_decay=0.01,
    save_total_limit=3,
    num_train_epochs=num_train_epochs,
    predict_with_generate=True,
    logging_steps=logging_steps,
    push_to_hub=True
)
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    # Decode generated summaries into text
    decoded_preds = tokenizer.batch_decode(predictions, skip_special_tokens=True)
    # Replace -100 in the labels as we can't decode them
    labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
    # Decode reference summaries into text
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
    # ROUGE expects a newline after each sentence
    decoded_preds = ["\n".join(sent_tokenize(pred.strip())) for pred in decoded_preds]
    decoded_labels = ["\n".join(sent_tokenize(label.strip())) for label in decoded_labels]
    # Compute ROUGE scores
    result = rouge.compute(
        predictions=decoded_preds, references=decoded_labels, use_stemmer=True
    )
    # Extract the median scores
    result = {key: value.mid.fmeasure * 100 for key, value in result.items()}
    return {k: round(v, 4) for k, v in result.items()}
from transformers import DataCollatorForSeq2Seq

data_collator = DataCollatorForSeq2Seq(tokenizer, model=model)

from transformers import Seq2SeqTrainer

trainer = Seq2SeqTrainer(
    model = model,
    args = args,
    train_dataset=tokenized_datasets["article"],
    eval_dataset=test_tokenized_datasets["article"],
    data_collator=data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
)

Here is the final error I am getting…

RepositoryNotFoundError: 404 Client Error. (Request ID: Root=1-64f004f3-5195d9d41d468e89023d924f;a7ed83d8-a09f-494e-ae69-8623b7517abb)

Repository Not Found for url: https://huggingface.co/api/models/mT5.
Please make sure you specified the correct repo_id and repo_type.
If you are trying to access a private or gated repo, make sure you are authenticated.

One issue which i found in argument output_dir of Seq2SeqTrainingArguments is it should be your local path rather than remote path and you cannot use a remote path over here.

This output directory helps us to save the model checkpoints and other stuffs .
see the docs of Training Arguments

Thanks for the suggestion. Along with the solution you mentioned, another issue was that the access token had just read access whereas we need to provide write access.

3 Likes