Hi,
I’ve been following the SageMaker demo available here - for training a binary sentiment classifier on the IMDB dataset.
I ran the notebook as provided, to get a feel for how to use Hugging Face with SageMaker, and the notebook successfully runs and the model trains. However, upon testing the resulting model I found that the output was the same, regardless of the input sentence. For example:
>>>sentiment_input= {"inputs":"This is the best movie I have ever watched. It is amazing!"}
>>>print(predictor.predict(sentiment_input))
[{'label': 'LABEL_0', 'score': 0.9999932050704956}]
>>>sentiment_input= {"inputs":"This is the worst movie I have ever watched. It is terrible!"}
>>>print(predictor.predict(sentiment_input))
[{'label': 'LABEL_0', 'score': 0.9999932050704956}]
I’ve looked through the code, but I can’t seem to find what might cause this behaviour. Unless the dataset isn’t being loaded properly (eg. if the classifier is only training with one label), or something is going wrong with the tokenization. However, I haven’t made any changes to the notebook and training script provided, so this seems unlikely.
Any help would be much appreciated! Thanks in advance.