What is the meaning of: "ValueError: No gradients provided for any variable"?

Hi I am following the Huggingface course for Question Answering.

I built my own Dataset and all the features are present and I get the exact same results up until fitting the model.
There I get the above error.
After some research, it seems this is caused by not having the columns in the correct order.

The tokenizer does output it in a different order and I changed it, but neither the order in the course nor the order of the tokenizer seem to work.

Can someone think of another issue?
I don’t have the Data Collator as it’s deprecated now.
Token Type Ids are commented out because the tokenizer does not return them.
I’m using "distilbert-base-cased-distilled-squad" because I just want to try and that seems like the fastest (smallest) model.

tf_train_dataset = train_dataset.to_tf_dataset(
    columns=[
        "attention_mask",
        "end_positions",
        "input_ids",
        "start_positions",
        #"token_type_ids",
    ],
    shuffle=True,
    batch_size=4,
)

Thank you very much!

I get the same error with the model from the tutorial.

Pinging @Rocketknight1 here who’s the TensorFlow expert on the course :slight_smile:

1 Like

Hey Lewis, thanks!
I’m actually doing your book and the course in parallel. :hugs:

Edit:
I did all the same preprocessing but this time switched out the model and used pytorch to train and it works. No error.
This is weird, would still love some insights to this error

1 Like

Hi @ollibolli, this is a good question! We’re thinking about a refactor of the internals of some of our TF models to make it a bit clearer, because this is one of the most common issues people encounter.

I don’t -think- the order of the columns should matter. Instead, what’s happening here is that you compiled the model with a Keras loss, but you’re passing the labels in the input dictionary. This is explained in more detail in the HF course here: How to ask for help - Hugging Face Course

If you search in that file for the “No gradients provided…” error you’ll see what it is, and how to fix it. If you have any other issues, or you don’t think the course notes do a good job explaining the problem, feel free to let me know!

3 Likes

Hi Rocketnight, thank you very much. I went over that part but probably not carefully enough.
I’ll go over it again! Cheers, Oli

Hi @ollibolli, that’s probably our fault for not making it intuitive enough! The key idea is that in Keras, usually the loss is computed by a Keras loss function, which you pass to compile() in the loss argument. If you do that, then you need to pass labels in the label_cols argument, and if they aren’t there, Keras won’t be able to see your labels, won’t know what to do with the loss function, and will complain that there’s no gradients (because it couldn’t compute the loss).

With Hugging Face models, though, you can also just skip providing the loss argument to compile() entirely. If you do this, the model will compute loss internally (this is really helpful in some cases, because the loss may be quite complex to specify as a Keras loss). When you do this, the labels should be in the input dictionary (like they are in your code), so that the model can see them.

tl;dr Do one of two things:

  1. Pass a loss argument to compile() + put labels in label_cols
  2. Don’t pass a loss argument to compile() + put labels in columns

We’re well aware that this can be unintuitive, though, and we’re working on a way to make sure the labels ‘just work’ in both cases without these fiddly details.

1 Like

Oh thanks a lot.
Not at all, there is always room for improvement though in general, I found the course super helpful.
I’m doing that in tandem with Lewis’ book and a bit of Kaggle.
I find it super intuitive, just a week ago I hadn’t known much and I feel pretty comfortable already.

thanks for taking the time, I’ll try your steps and will report back.

Hi, I tried both versions, but I hadn’t had much luck with the TF version of my code.
I think I am not clear on what the labels are.

I thought it would be start and endpositions, but with option 2 (No loss in compile() + labels in columns) I get first the complaint that it needs input_ids, after adding that I still get the same error.

Option one seems more promising, but:

tf_train_dataset = train_dataset.to_tf_dataset(
    columns=[
        "attention_mask",
        "end_positions",
        "input_ids",
        "start_positions",
        "token_type_ids",
    ],
    label_cols=[
        'start_positions', 
        'end_positions'
    ],
    shuffle=True,
    batch_size=16,
)

gives a shape mismatch with several losses I’ve tried.
Also the course passes no loss, but also seems to just have all columns.

Is it possible that this has something to do with data_collator not being used?

Thanks a lot, sorry for being a bit of a pest here. The pytorch version works like charm weirdly.

Hm, this is interesting! Would you be willing to share your whole script so I can try to reproduce it here? You can just give me a couple of rows of your dataset rather than the whole thing, or just any data with the right shape so the script runs.

I’ll see if I can make a small sample of it, shouldn’t be too hard, and then a colab? would that work? thanks a lot!

Any short script that reproduces the problem you’re getting is perfect, and Colab is fine!

1 Like