Fine-Tune for MultiClass or MultiLabel-MultiClass

Hi @lewtun,
Wow, thanks for including this feature in the library.

I tried it in your Colab notebook you attached in this thread. As for as loading the model with problem_type="multi_label_classification", I just changed this line in the notebook as you wrote, it works fine.

num_labels=6
#model = BertForMultilabelSequenceClassification.from_pretrained(model_ckpt, num_labels=num_labels).to('cuda')
model = AutoModelForSequenceClassification.from_pretrained(model_ckpt, num_labels=num_labels, problem_type="multi_label_classification").to('cuda')

But it’s throwing error in the evaluate and train method.

# sanity check that we can run evaluation
trainer.evaluate()

Output:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-83-ce49fe2d24ca> in <module>()
      1 # sanity check that we can run evaluation
----> 2 trainer.evaluate()

8 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in binary_cross_entropy_with_logits(input, target, weight, size_average, reduce, reduction, pos_weight)
   2827         raise ValueError("Target size ({}) must be the same as input size ({})".format(target.size(), input.size()))
   2828 
-> 2829     return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)
   2830 
   2831 

RuntimeError: result type Float can't be cast to the desired output type Long

I was checking with BCEWithLogitsLoss (invoked when problem_type="multi_label_classification"), the problem it seems due to the input labels are in int type and the logits are in float type.

What is the best approach to solve this, should I change the type of the input label indicator ids? Or the error is due to something else and I am wrongly interpreting this error?

Thanks,
Loganathan

1 Like

I think you need your labels to floats for this loss.

3 Likes

sylvain beat me to it :slight_smile:

i was curious how the problem_type really worked so cooked up a quick 'n dirty notebook which shows how the type conversion can be made: transformers_multilabel-text-classification-with-problem-type.ipynb - Google Drive

10 Likes

@sgugger: Thanks for the suggestion, it worked with float labels.

@lewtun: Thanks again for this new notebook (also using direct dataset approach), works perfectly. Two notebooks with different dataset loading approaches, couldn’t have asked for more.

3 Likes

great to hear that it worked!

1 Like

Hi,

Great thread! I learned a lot just from reading it. I have a follow up question, hope that is ok.

Does the TFAutoModelForSequenceClassification.from_pretrained() class also has the problem_type argument? Couldn’t find it in the documentation Auto Classes — transformers 4.1.1 documentation or I should implement the loss and activation function for multi-label myself similar to what dikster99 did in the notebooks?

Thanks,

Ayala

Hi,

Thanks very much for this thread. Very helpful. I have a question about the loss function you used for the multi-label classifier. Why didn’t you use tf.keras.losses.BinaryCrossentropy()?
I can see you tried it in the notebooks but ended up using the custom loss you created (please see below)

#
# https://stackoverflow.com/questions/52125924/why-does-sigmoid-crossentropy-of-keras-tensorflow-have-low-precision
# keras with custom loss function
def customLoss(target, output):
    # if not from_logits:
    #     # transform back to logits
    #     _epsilon = _to_tensor(epsilon(), output.dtype.base_dtype)
    #     output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
    #     output = tf.log(output / (1 - output))
    output = tf.math.log(output / (1 - output))
    return tf.nn.sigmoid_cross_entropy_with_logits(labels=target, logits=output)

optimizer = tf.keras.optimizers.Adam(lr=1e-3)
model.compile(
    loss=customLoss,
    optimizer=optimizer,
    metrics=['accuracy']
    )

#loss=tf.keras.losses.BinaryCrossentropy()
#optimizer = tf.keras.optimizers.Adam()  #lr=1e-3,  lr=0.001, decay=1e-6
#model.compile(
#    loss=loss,
#    optimizer=optimizer,
#    metrics=['accuracy']
#    )

Thanks!
Ayala

Hi, how can I make this to work for multiclass classification?
I keep getting errors

2 Likes

hey @ayalaall happy to hear you found this thread useful :slight_smile:

since the tensorflow codebase has recently been revamped to make keras a first-class citizen (and TFTrainer is now deprecated), much of what i wrote above is out of date now.

my suggestion would be to check out the new tensorflow examples here: transformers/examples/tensorflow/text-classification at master · huggingface/transformers · GitHub

1 Like

Hi @lewtun,

Thanks very much for the reference!
I started working on a version of my own for applying multi label text classification using hugging face transformers and the example @dikster99 published in the previous threads in this posts.

I have a few questions about building a multi label classification model in Tensorflow, hope that is ok.
I first get the transformer model using

            transformer_model = TFAutoModelForSequenceClassification.from_pretrained(
                ml_params.transformer_model_name, config=config

I then build the keras model using

        bert = transformer_model.layers[0]
        input_ids = tf.keras.layers.Input(shape=(input_dim,), name='input_ids', dtype='int32')
        attention_mask = tf.keras.layers.Input(shape=(input_dim,), name='attention_mask', dtype='int32')
        inputs = {'input_ids': input_ids, 'attention_mask': attention_mask}
        # https://github.com/huggingface/transformers/issues/7540
        bert_model = bert(input_ids, attention_mask)[1]
        X = tf.keras.layers.Dropout(transformer_model.config.hidden_dropout_prob, name='pooled_output')(bert_model)
        X = tf.keras.layers.Dense(units=num_labels, activation='sigmoid', name='dense')(X)
        model = tf.keras.Model(inputs=inputs, outputs=X)
        model.summary()

My main question is regarding bert_model = bert(input_ids, attention_mask)[1]. This takes the CLS representation following a linear layer and a Tanh activation function. As far as I understand, this is different from the original implementation of the bert model as in the original paper “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”.
Why was this pooler layer added in TFAutoModelForSequenceClassification?

Second question is regarding the loss function. Currently I’m using tf.keras.losses.BinaryCrossentropy(from_logits=False). I noticed that in pytorch examples such as this one they used torch.nn.BCEWithLogitsLoss(). I wasn’t able to to find any parallel in tensorflow. Would be happy to get any input/reference about that, or you think using tf.keras.losses.BinaryCrossentropy(from_logits=False) should be ok?

Any help will be greatly appreciated.
Thanks,
Ayala

hey @ayalaall i’ll let our resident tensorflow expert (@Rocketknight1) chime in, but here’s my 2 cents:

i think this was done because it closely follows the original implementation by the google team (see e.g. here). if i understand correctly, they do this to handle the sentence-pair prediction task during pretraining.

i think your idea to use tf.kera.losses.BinaryCrossentropy makes a lot of sense! as a sanity check, i would test it out on a small sample of data to check that overfitting produces the correct predictions on the training set :slight_smile:

2 Likes

Thanks very much for the answers!

Hi I used this approach to modify the train.py code in this git repo to train the model on go_emotions dataset. However I obtain the following error when I try to fit the model:

2021-09-11 09:21:02,657 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training
2021-09-11 09:21:02,688 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.
2021-09-11 09:21:08,914 sagemaker_pytorch_container.training INFO     Invoking user training script.
2021-09-11 09:21:09,260 sagemaker-training-toolkit INFO     Invoking user script

Training Env:

{
    "additional_framework_parameters": {},
    "channel_input_dirs": {
        "test": "/opt/ml/input/data/test",
        "train": "/opt/ml/input/data/train"
    },
    "current_host": "algo-1",
    "framework_module": "sagemaker_pytorch_container.training:main",
    "hosts": [
        "algo-1"
    ],
    "hyperparameters": {
        "train_batch_size": 32,
        "model_name": "distilbert-base-uncased",
        "epochs": 1
    },
    "input_config_dir": "/opt/ml/input/config",
    "input_data_config": {
        "test": {
            "TrainingInputMode": "File",
            "S3DistributionType": "FullyReplicated",
            "RecordWrapperType": "None"
        },
        "train": {
            "TrainingInputMode": "File",
            "S3DistributionType": "FullyReplicated",
            "RecordWrapperType": "None"
        }
    },
    "input_dir": "/opt/ml/input",
    "is_master": true,
    "job_name": "ge0908-senti-tj-2021-09-11-09-05-59-2021-09-11-09-15-45-374",
    "log_level": 20,
    "master_hostname": "algo-1",
    "model_dir": "/opt/ml/model",
    "module_dir": "s3://sagemaker-ap-southeast-1-178538799605/ge0908-senti-tj-2021-09-11-09-05-59-2021-09-11-09-15-45-374/source/sourcedir.tar.gz",
    "module_name": "train",
    "network_interface_name": "eth0",
    "num_cpus": 4,
    "num_gpus": 1,
    "output_data_dir": "/opt/ml/output/data",
    "output_dir": "/opt/ml/output",
    "output_intermediate_dir": "/opt/ml/output/intermediate",
    "resource_config": {
        "current_host": "algo-1",
        "hosts": [
            "algo-1"
        ],
        "network_interface_name": "eth0"
    },
    "user_entry_point": "train.py"
}

Environment variables:

SM_HOSTS=["algo-1"]
SM_NETWORK_INTERFACE_NAME=eth0
SM_HPS={"epochs":1,"model_name":"distilbert-base-uncased","train_batch_size":32}
SM_USER_ENTRY_POINT=train.py
SM_FRAMEWORK_PARAMS={}
SM_RESOURCE_CONFIG={"current_host":"algo-1","hosts":["algo-1"],"network_interface_name":"eth0"}
SM_INPUT_DATA_CONFIG={"test":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"},"train":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"}}
SM_OUTPUT_DATA_DIR=/opt/ml/output/data
SM_CHANNELS=["test","train"]
SM_CURRENT_HOST=algo-1
SM_MODULE_NAME=train
SM_LOG_LEVEL=20
SM_FRAMEWORK_MODULE=sagemaker_pytorch_container.training:main
SM_INPUT_DIR=/opt/ml/input
SM_INPUT_CONFIG_DIR=/opt/ml/input/config
SM_OUTPUT_DIR=/opt/ml/output
SM_NUM_CPUS=4
SM_NUM_GPUS=1
SM_MODEL_DIR=/opt/ml/model
SM_MODULE_DIR=s3://sagemaker-ap-southeast-1-178538799605/ge0908-senti-tj-2021-09-11-09-05-59-2021-09-11-09-15-45-374/source/sourcedir.tar.gz
SM_TRAINING_ENV={"additional_framework_parameters":{},"channel_input_dirs":{"test":"/opt/ml/input/data/test","train":"/opt/ml/input/data/train"},"current_host":"algo-1","framework_module":"sagemaker_pytorch_container.training:main","hosts":["algo-1"],"hyperparameters":{"epochs":1,"model_name":"distilbert-base-uncased","train_batch_size":32},"input_config_dir":"/opt/ml/input/config","input_data_config":{"test":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"},"train":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"}},"input_dir":"/opt/ml/input","is_master":true,"job_name":"ge0908-senti-tj-2021-09-11-09-05-59-2021-09-11-09-15-45-374","log_level":20,"master_hostname":"algo-1","model_dir":"/opt/ml/model","module_dir":"s3://sagemaker-ap-southeast-1-178538799605/ge0908-senti-tj-2021-09-11-09-05-59-2021-09-11-09-15-45-374/source/sourcedir.tar.gz","module_name":"train","network_interface_name":"eth0","num_cpus":4,"num_gpus":1,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1","hosts":["algo-1"],"network_interface_name":"eth0"},"user_entry_point":"train.py"}
SM_USER_ARGS=["--epochs","1","--model_name","distilbert-base-uncased","--train_batch_size","32"]
SM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate
SM_CHANNEL_TEST=/opt/ml/input/data/test
SM_CHANNEL_TRAIN=/opt/ml/input/data/train
SM_HP_TRAIN_BATCH_SIZE=32
SM_HP_MODEL_NAME=distilbert-base-uncased
SM_HP_EPOCHS=1
PYTHONPATH=/opt/ml/code:/opt/conda/bin:/opt/conda/lib/python36.zip:/opt/conda/lib/python3.6:/opt/conda/lib/python3.6/lib-dynload:/opt/conda/lib/python3.6/site-packages

Invoking script with the following command:

/opt/conda/bin/python3.6 train.py --epochs 1 --model_name distilbert-base-uncased --train_batch_size 32


2021-09-11 09:21:12,708 - __main__ - INFO -  loaded train_dataset length is: 100
2021-09-11 09:21:12,709 - __main__ - INFO -  loaded test_dataset length is: 5427
2021-09-11 09:21:13,603 - filelock - INFO - Lock 140033116071920 acquired on /root/.cache/huggingface/transformers/23454919702d26495337f3da04d1655c7ee010d5ec9d77bdb9e399e00302c0a1.91b885ab15d631bf9cee9dc9d25ece0afd932f2f5130eba28f2055b2220c0333.lock
2021-09-11 09:21:14,456 - filelock - INFO - Lock 140033116071920 released on /root/.cache/huggingface/transformers/23454919702d26495337f3da04d1655c7ee010d5ec9d77bdb9e399e00302c0a1.91b885ab15d631bf9cee9dc9d25ece0afd932f2f5130eba28f2055b2220c0333.lock
2021-09-11 09:21:15,317 - filelock - INFO - Lock 140032915610704 acquired on /root/.cache/huggingface/transformers/9c169103d7e5a73936dd2b627e42851bec0831212b677c637033ee4bce9ab5ee.126183e36667471617ae2f0835fab707baa54b731f991507ebbb55ea85adb12a.lock
2021-09-11 09:21:20,460 - filelock - INFO - Lock 140032915610704 released on /root/.cache/huggingface/transformers/9c169103d7e5a73936dd2b627e42851bec0831212b677c637033ee4bce9ab5ee.126183e36667471617ae2f0835fab707baa54b731f991507ebbb55ea85adb12a.lock
Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.bias', 'classifier.bias', 'classifier.weight', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
2021-09-11 09:21:22,834 - filelock - INFO - Lock 140032911509432 acquired on /root/.cache/huggingface/transformers/0e1bbfda7f63a99bb52e3915dcf10c3c92122b827d92eb2d34ce94ee79ba486c.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock
2021-09-11 09:21:24,522 - filelock - INFO - Lock 140032911509432 released on /root/.cache/huggingface/transformers/0e1bbfda7f63a99bb52e3915dcf10c3c92122b827d92eb2d34ce94ee79ba486c.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock
2021-09-11 09:21:25,373 - filelock - INFO - Lock 140032915671416 acquired on /root/.cache/huggingface/transformers/75abb59d7a06f4f640158a9bfcde005264e59e8d566781ab1415b139d2e4c603.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4.lock
2021-09-11 09:21:27,294 - filelock - INFO - Lock 140032915671416 released on /root/.cache/huggingface/transformers/75abb59d7a06f4f640158a9bfcde005264e59e8d566781ab1415b139d2e4c603.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4.lock
2021-09-11 09:21:29,862 - filelock - INFO - Lock 140032915672088 acquired on /root/.cache/huggingface/transformers/8c8624b8ac8aa99c60c912161f8332de003484428c47906d7ff7eb7f73eecdbb.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79.lock
2021-09-11 09:21:30,714 - filelock - INFO - Lock 140032915672088 released on /root/.cache/huggingface/transformers/8c8624b8ac8aa99c60c912161f8332de003484428c47906d7ff7eb7f73eecdbb.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79.lock
[2021-09-11 09:21:34.048 algo-1:26 INFO utils.py:27] RULE_JOB_STOP_SIGNAL_FILENAME: None
[2021-09-11 09:21:34.144 algo-1:26 INFO profiler_config_parser.py:102] User has disabled profiler.
[2021-09-11 09:21:34.144 algo-1:26 INFO json_config.py:91] Creating hook from json_config at /opt/ml/input/config/debughookconfig.json.
[2021-09-11 09:21:34.144 algo-1:26 INFO hook.py:201] tensorboard_dir has not been set for the hook. SMDebug will not be exporting tensorboard summaries.
[2021-09-11 09:21:34.145 algo-1:26 INFO hook.py:255] Saving to /opt/ml/output/tensors
[2021-09-11 09:21:34.146 algo-1:26 INFO state_store.py:77] The checkpoint config file /opt/ml/input/config/checkpointconfig.json does not exist.

2021-09-11 09:21:44 Uploading - Uploading generated training model#015Downloading:   0%|          | 0.00/483 [00:00<?, ?B/s]#015Downloading: 100%|██████████| 483/483 [00:00<00:00, 438kB/s]
#015Downloading:   0%|          | 0.00/268M [00:00<?, ?B/s]#015Downloading:   1%|▏         | 3.49M/268M [00:00<00:07, 34.9MB/s]#015Downloading:   3%|▎         | 6.72M/268M [00:00<00:07, 33.9MB/s]#015Downloading:   5%|▌         | 13.7M/268M [00:00<00:06, 40.0MB/s]#015Downloading:   6%|▋         | 16.9M/268M [00:00<00:07, 31.5MB/s]#015Downloading:   7%|▋         | 19.8M/268M [00:00<00:08, 29.6MB/s]#015Downloading:  10%|█         | 27.9M/268M [00:00<00:06, 36.5MB/s]#015Downloading:  12%|█▏        | 33.4M/268M [00:00<00:05, 40.4MB/s]#015Downloading:  15%|█▌        | 40.8M/268M [00:00<00:04, 46.8MB/s]#015Downloading:  18%|█▊        | 49.0M/268M [00:00<00:04, 51.5MB/s]#015Downloading:  20%|██        | 54.9M/268M [00:01<00:05, 36.5MB/s]#015Downloading:  23%|██▎       | 61.8M/268M [00:01<00:04, 42.4MB/s]#015Downloading:  25%|██▌       | 67.2M/268M [00:01<00:04, 45.0MB/s]#015Downloading:  27%|██▋       | 72.6M/268M [00:01<00:04, 47.3MB/s]#015Downloading:  29%|██▉       | 78.0M/268M [00:01<00:04, 43.7MB/s]#015Downloading:  32%|███▏      | 84.9M/268M [00:01<00:03, 49.2MB/s]#015Downloading:  35%|███▍      | 93.1M/268M [00:01<00:03, 55.8MB/s]#015Downloading:  38%|███▊      | 101M/268M [00:02<00:02, 61.9MB/s] #015Downloading:  41%|████      | 110M/268M [00:02<00:02, 66.9MB/s]#015Downloading:  44%|████▍     | 118M/268M [00:02<00:02, 71.3MB/s]#015Downloading:  47%|████▋     | 126M/268M [00:02<00:02, 70.1MB/s]#015Downloading:  50%|████▉     | 133M/268M [00:02<00:02, 55.4MB/s]#015Downloading:  52%|█████▏    | 139M/268M [00:02<00:02, 52.8MB/s]#015Downloading:  54%|█████▍    | 145M/268M [00:02<00:02, 49.7MB/s]#015Downloading:  57%|█████▋    | 152M/268M [00:02<00:02, 55.1MB/s]#015Downloading:  59%|█████▉    | 159M/268M [00:02<00:01, 58.0MB/s]#015Downloading:  62%|██████▏   | 165M/268M [00:03<00:01, 59.1MB/s]#015Downloading:  64%|██████▍   | 172M/268M [00:03<00:01, 53.7MB/s]#015Downloading:  66%|██████▌   | 177M/268M [00:03<00:01, 49.6MB/s]#015Downloading:  69%|██████▉   | 185M/268M [00:03<00:01, 55.7MB/s]#015Downloading:  71%|███████▏  | 191M/268M [00:03<00:01, 44.8MB/s]#015Downloading:  74%|███████▍  | 198M/268M [00:03<00:01, 50.7MB/s]#015Downloading:  77%|███████▋  | 207M/268M [00:03<00:01, 58.4MB/s]#015Downloading:  80%|███████▉  | 214M/268M [00:04<00:01, 46.0MB/s]#015Downloading:  83%|████████▎ | 222M/268M [00:04<00:00, 52.4MB/s]#015Downloading:  86%|████████▌ | 231M/268M [00:04<00:00, 59.2MB/s]#015Downloading:  89%|████████▊ | 237M/268M [00:04<00:00, 60.5MB/s]#015Downloading:  92%|█████████▏| 246M/268M [00:04<00:00, 66.3MB/s]#015Downloading:  95%|█████████▌| 255M/268M [00:04<00:00, 71.4MB/s]#015Downloading:  98%|█████████▊| 262M/268M [00:04<00:00, 47.1MB/s]#015Downloading: 100%|██████████| 268M/268M [00:05<00:00, 53.3MB/s]
Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.bias', 'classifier.bias', 'classifier.weight', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
#015Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]#015Downloading:  13%|█▎        | 30.7k/232k [00:00<00:01, 148kB/s]#015Downloading:  39%|███▉      | 90.1k/232k [00:00<00:00, 173kB/s]#015Downloading:  88%|████████▊ | 205k/232k [00:00<00:00, 218kB/s] #015Downloading: 100%|██████████| 232k/232k [00:00<00:00, 369kB/s]
#015Downloading:   0%|          | 0.00/466k [00:00<?, ?B/s]#015Downloading:   6%|▌         | 28.7k/466k [00:00<00:03, 137kB/s]#015Downloading:  20%|██        | 94.2k/466k [00:00<00:02, 164kB/s]#015Downloading:  45%|████▍     | 209k/466k [00:00<00:01, 208kB/s] #015Downloading:  94%|█████████▍| 438k/466k [00:00<00:00, 274kB/s]#015Downloading: 100%|██████████| 466k/466k [00:00<00:00, 552kB/s]
#015Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]#015Downloading: 100%|██████████| 28.0/28.0 [00:00<00:00, 24.6kB/s]
#015  0%|          | 0/4 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 699, in convert_to_tensors
    tensor = as_tensor(value)
ValueError: expected sequence of length 1 at dim 1 (got 2)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 98, in <module>
    trainer.train()
  File "/opt/conda/lib/python3.6/site-packages/transformers/trainer.py", line 1246, in train
    for step, inputs in enumerate(epoch_iterator):
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 444, in __next__
    (data, worker_id) = self._next_data()
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 526, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
    return self.collate_fn(data)
  File "/opt/conda/lib/python3.6/site-packages/transformers/data/data_collator.py", line 123, in __call__
    return_tensors="pt",
  File "/opt/conda/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 2680, in pad
    return BatchEncoding(batch_outputs, tensor_type=return_tensors)
  File "/opt/conda/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 204, in __init__
    self.convert_to_tensors(tensor_type=tensor_type, prepend_batch_axis=prepend_batch_axis)
  File "/opt/conda/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 716, in convert_to_tensors
    "Unable to create tensor, you should probably activate truncation and/or padding "
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length.
#015  0%|          | 0/4 [00:00<?, ?it/s]

2021-09-11 09:21:34,866 sagemaker-training-toolkit ERROR    ExecuteUserScriptError:
Command "/opt/conda/bin/python3.6 train.py --epochs 1 --model_name distilbert-base-uncased --train_batch_size 32"
#015Downloading:   0%|          | 0.00/483 [00:00<?, ?B/s]#015Downloading: 100%|██████████| 483/483 [00:00<00:00, 438kB/s]
#015Downloading:   0%|          | 0.00/268M [00:00<?, ?B/s]#015Downloading:   1%|▏         | 3.49M/268M [00:00<00:07, 34.9MB/s]#015Downloading:   3%|▎         | 6.72M/268M [00:00<00:07, 33.9MB/s]#015Downloading:   5%|▌         | 13.7M/268M [00:00<00:06, 40.0MB/s]#015Downloading:   6%|▋         | 16.9M/268M [00:00<00:07, 31.5MB/s]#015Downloading:   7%|▋         | 19.8M/268M [00:00<00:08, 29.6MB/s]#015Downloading:  10%|█         | 27.9M/268M [00:00<00:06, 36.5MB/s]#015Downloading:  12%|█▏        | 33.4M/268M [00:00<00:05, 40.4MB/s]#015Downloading:  15%|█▌        | 40.8M/268M [00:00<00:04, 46.8MB/s]#015Downloading:  18%|█▊        | 49.0M/268M [00:00<00:04, 51.5MB/s]#015Downloading:  20%|██        | 54.9M/268M [00:01<00:05, 36.5MB/s]#015Downloading:  23%|██▎       | 61.8M/268M [00:01<00:04, 42.4MB/s]#015Downloading:  25%|██▌       | 67.2M/268M [00:01<00:04, 45.0MB/s]#015Downloading:  27%|██▋       | 72.6M/268M [00:01<00:04, 47.3MB/s]#015Downloading:  29%|██▉       | 78.0M/268M [00:01<00:04, 43.7MB/s]#015Downloading:  32%|███▏      | 84.9M/268M [00:01<00:03, 49.2MB/s]#015Downloading:  35%|███▍      | 93.1M/268M [00:01<00:03, 55.8MB/s]#015Downloading:  38%|███▊      | 101M/268M [00:02<00:02, 61.9MB/s] #015Downloading:  41%|████      | 110M/268M [00:02<00:02, 66.9MB/s]#015Downloading:  44%|████▍     | 118M/268M [00:02<00:02, 71.3MB/s]#015Downloading:  47%|████▋     | 126M/268M [00:02<00:02, 70.1MB/s]#015Downloading:  50%|████▉     | 133M/268M [00:02<00:02, 55.4MB/s]#015Downloading:  52%|█████▏    | 139M/268M [00:02<00:02, 52.8MB/s]#015Downloading:  54%|█████▍    | 145M/268M [00:02<00:02, 49.7MB/s]#015Downloading:  57%|█████▋    | 152M/268M [00:02<00:02, 55.1MB/s]#015Downloading:  59%|█████▉    | 159M/268M [00:02<00:01, 58.0MB/s]#015Downloading:  62%|██████▏   | 165M/268M [00:03<00:01, 59.1MB/s]#015Downloading:  64%|██████▍   | 172M/268M [00:03<00:01, 53.7MB/s]#015Downloading:  66%|██████▌   | 177M/268M [00:03<00:01, 49.6MB/s]#015Downloading:  69%|██████▉   | 185M/268M [00:03<00:01, 55.7MB/s]#015Downloading:  71%|███████▏  | 191M/268M [00:03<00:01, 44.8MB/s]#015Downloading:  74%|███████▍  | 198M/268M [00:03<00:01, 50.7MB/s]#015Downloading:  77%|███████▋  | 207M/268M [00:03<00:01, 58.4MB/s]#015Downloading:  80%|███████▉  | 214M/268M [00:04<00:01, 46.0MB/s]#015Downloading:  83%|████████▎ | 222M/268M [00:04<00:00, 52.4MB/s]#015Downloading:  86%|████████▌ | 231M/268M [00:04<00:00, 59.2MB/s]#015Downloading:  89%|████████▊ | 237M/268M [00:04<00:00, 60.5MB/s]#015Downloading:  92%|█████████▏| 246M/268M [00:04<00:00, 66.3MB/s]#015Downloading:  95%|█████████▌| 255M/268M [00:04<00:00, 71.4MB/s]#015Downloading:  98%|█████████▊| 262M/268M [00:04<00:00, 47.1MB/s]#015Downloading: 100%|██████████| 268M/268M [00:05<00:00, 53.3MB/s]
Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.bias', 'classifier.bias', 'classifier.weight', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
#015Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]#015Downloading:  13%|█▎        | 30.7k/232k [00:00<00:01, 148kB/s]#015Downloading:  39%|███▉      | 90.1k/232k [00:00<00:00, 173kB/s]#015Downloading:  88%|████████▊ | 205k/232k [00:00<00:00, 218kB/s] #015Downloading: 100%|██████████| 232k/232k [00:00<00:00, 369kB/s]
#015Downloading:   0%|          | 0.00/466k [00:00<?, ?B/s]#015Downloading:   6%|▌         | 28.7k/466k [00:00<00:03, 137kB/s]#015Downloading:  20%|██        | 94.2k/466k [00:00<00:02, 164kB/s]#015Downloading:  45%|████▍     | 209k/466k [00:00<00:01, 208kB/s] #015Downloading:  94%|█████████▍| 438k/466k [00:00<00:00, 274kB/s]#015Downloading: 100%|██████████| 466k/466k [00:00<00:00, 552kB/s]
#015Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]#015Downloading: 100%|██████████| 28.0/28.0 [00:00<00:00, 24.6kB/s]
#015  0%|          | 0/4 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 699, in convert_to_tensors
    tensor = as_tensor(value)
ValueError: expected sequence of length 1 at dim 1 (got 2)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 98, in <module>
    trainer.train()
  File "/opt/conda/lib/python3.6/site-packages/transformers/trainer.py", line 1246, in train
    for step, inputs in enumerate(epoch_iterator):
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 444, in __next__
    (data, worker_id) = self._next_data()
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 526, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
    return self.collate_fn(data)
  File "/opt/conda/lib/python3.6/site-packages/transformers/data/data_collator.py", line 123, in __call__
    return_tensors="pt",
  File "/opt/conda/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 2680, in pad
    return BatchEncoding(batch_outputs, tensor_type=return_tensors)
  File "/opt/conda/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 204, in __init__
    self.convert_to_tensors(tensor_type=tensor_type, prepend_batch_axis=prepend_batch_axis)
  File "/opt/conda/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 716, in convert_to_tensors
    "Unable to create tensor, you should probably activate truncation and/or padding "
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length.
#015  0%|          | 0/4 [00:00<?, ?it/s]

2021-09-11 09:22:44 Failed - Training job failed
---------------------------------------------------------------------------
UnexpectedStatusException                 Traceback (most recent call last)
<ipython-input-18-b69b5ed45b37> in <module>
      3                          experiment_config = {
      4                         "ExperimentName": experiment_name,
----> 5                         "TrialName" : trial.trial_name}
      6                          )

/opt/conda/lib/python3.6/site-packages/sagemaker/estimator.py in fit(self, inputs, wait, logs, job_name, experiment_config)
    684         self.jobs.append(self.latest_training_job)
    685         if wait:
--> 686             self.latest_training_job.wait(logs=logs)
    687 
    688     def _compilation_job_name(self):

/opt/conda/lib/python3.6/site-packages/sagemaker/estimator.py in wait(self, logs)
   1629         # If logs are requested, call logs_for_jobs.
   1630         if logs != "None":
-> 1631             self.sagemaker_session.logs_for_job(self.job_name, wait=True, log_type=logs)
   1632         else:
   1633             self.sagemaker_session.wait_for_job(self.job_name)

/opt/conda/lib/python3.6/site-packages/sagemaker/session.py in logs_for_job(self, job_name, wait, poll, log_type)
   3674 
   3675         if wait:
-> 3676             self._check_job_status(job_name, description, "TrainingJobStatus")
   3677             if dot:
   3678                 print()

/opt/conda/lib/python3.6/site-packages/sagemaker/session.py in _check_job_status(self, job, desc, status_key_name)
   3234                 ),
   3235                 allowed_statuses=["Completed", "Stopped"],
-> 3236                 actual_status=status,
   3237             )
   3238 

UnexpectedStatusException: Error for Training job ge0908-senti-tj-2021-09-11-09-05-59-2021-09-11-09-15-45-374: Failed. Reason: AlgorithmError: ExecuteUserScriptError:
Command "/opt/conda/bin/python3.6 train.py --epochs 1 --model_name distilbert-base-uncased --train_batch_size 32"

Downloading:   0%|          | 0.00/483 [00:00<?, ?B/s]
Downloading: 100%|██████████| 483/483 [00:00<00:00, 438kB/s]

Downloading:   0%|          | 0.00/268M [00:00<?, ?B/s]
Downloading:   1%|▏         | 3.49M/268M [00:00<00:07, 34.9MB/s]
Downloading:   3%|▎         | 6.72M/268M [00:00<00:07, 33.9MB/s]
Downloading:   5%|▌         | 13.7M/268M [00:00<00:06, 40.0MB/s]
Downloading:   6%|▋         | 16.9M/268M [00:00<00:07, 31.5MB/s]
Downloading:   7%|▋         | 19.8M/268M [00:00<00:08, 29.6MB/s]
Downloading:  10%|█         | 27.9M/268M [00:00<00:06, 36.5MB/s]
Downloading:  12%|█▏        | 33.4M/268M [00:00<00:05, 40.4MB/s]
Downloading:  15%|█▌        | 40.8M/268M [00:00<00:04, 46.8MB/s]
Downloading:  18%|█▊        | 49.0M/268M [00:00<00:04, 51.5MB/s]
Downloading:  20%|██        | 54.9M/268M [00:01<00:05, 36.5MB/s

Kindly help how I can edit the git repo accordingly to perform multiclass text classification on go_emotions.

cc @philschmid who is the SageMaker expert :slight_smile:

2 Likes

That’s not a SageMaker issue. It looks like you still have a small issue with your train.py.
P.S. your link “git repo” is not working. have you built your custom version on top of the 01 or 02 notebook?

I am executing the following repo: notebooks/sagemaker/01_getting_started_pytorch at master · huggingface/notebooks · GitHub
But trying to change the dataset to go_emotions and perform a multilabel classification.
For that purpose, I’ve made the below changes:

model = AutoModelForSequenceClassification.from_pretrained(args.model_name, num_labels=28, 
                                                               problem_type="multi_label_classification")
tokenizer = AutoTokenizer.from_pretrained(args.model_name, 
                               problem_type="multi_label_classification")

at line 57, 58 in script/train.py in the git repo.

In the mentioned repo, how do I find the version?

Line 57,58 of train.py takes the argument model name, which can be any encoder model supported by Hugging Face, like BERT, DistilBERT or RoBERTA,
you can pass the model name while running the script like : python train.py --model_name=“bert-base-uncased” for more models check the model page Models - Hugging Face

@Baenka the training script doesn’t include tokenization. It expects an already preprocessed dataset. Can you share how you have processed your dataset?

I am using the ’ 01_getting_started_pytorch’ as it was on Aug 12.
Changes made to train.py to train the model on go_emotions dataset:

model = AutoModelForSequenceClassification.from_pretrained(args.model_name, num_labels=28, 
                                                               problem_type="multi_label_classification")
tokenizer = AutoTokenizer.from_pretrained(args.model_name, 
                               problem_type="multi_label_classification")

Go_emotions data downloading and preprocessing at notebook:

from datasets import load_dataset
from transformers import AutoTokenizer
# tokenizer used in preprocessing
# tokenizer_name = 'distilbert-base-uncased'
tokenizer_name = "distilbert-base-uncased"
# dataset used
dataset_name = 'go_emotions'

# load dataset
dataset = load_dataset(dataset_name, 'simplified')

# download tokenizer
tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)

# tokenizer helper function
def tokenize(batch):
    return tokenizer(batch['text'], 
#                      padding='max_length', 
                     padding=True,
                     truncation=True)

train_dataset, test_dataset = load_dataset(dataset_name, 'simplified', split=['train', 'test'], )
test_dataset = test_dataset
train_dataset = train_dataset.select(range(100)) 

train_dataset = train_dataset.map(tokenize)
test_dataset = test_dataset.map(tokenize)

train_dataset =  train_dataset.remove_columns(["id"])
train_dataset.set_format('torch', columns=["input_ids", 'attention_mask', 'labels'])
test_dataset = test_dataset.remove_columns(["id"])
test_dataset.set_format('torch', columns=["input_ids", 'attention_mask', 'labels'])

# s3 key prefix for the data
s3_prefix = 'datasets/ge_100'

import botocore
from datasets.filesystems import S3FileSystem

s3 = S3FileSystem()  

training_input_path = f's3://{sess.default_bucket()}/{s3_prefix}/train'
test_input_path = f's3://{sess.default_bucket()}/{s3_prefix}/test'

# save train_dataset to s3
train_dataset.save_to_disk(training_input_path,fs=s3)
test_dataset.save_to_disk(test_input_path,fs=s3)

import botocore
from datasets.filesystems import S3FileSystem

s3 = S3FileSystem()  
training_input_path = f's3://{sess.default_bucket()}/{s3_prefix}/train'
test_input_path = f's3://{sess.default_bucket()}/{s3_prefix}/test'

I am using the ’ 01_getting_started_pytorch’ as it was on Aug 12.
Changes made to train.py to train the model on go_emotions dataset:

model = AutoModelForSequenceClassification.from_pretrained(args.model_name, num_labels=28, 
                                                               problem_type="multi_label_classification")
tokenizer = AutoTokenizer.from_pretrained(args.model_name, 
                               problem_type="multi_label_classification")

Go_emotions data downloading and preprocessing at notebook:

from datasets import load_dataset
from transformers import AutoTokenizer
# tokenizer used in preprocessing
# tokenizer_name = 'distilbert-base-uncased'
tokenizer_name = "distilbert-base-uncased"
# dataset used
dataset_name = 'go_emotions'

# load dataset
dataset = load_dataset(dataset_name, 'simplified')

# download tokenizer
tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)

# tokenizer helper function
def tokenize(batch):
    return tokenizer(batch['text'], 
#                      padding='max_length', 
                     padding=True,
                     truncation=True)

train_dataset, test_dataset = load_dataset(dataset_name, 'simplified', split=['train', 'test'], )
test_dataset = test_dataset
train_dataset = train_dataset.select(range(100)) 

train_dataset = train_dataset.map(tokenize)
test_dataset = test_dataset.map(tokenize)

train_dataset =  train_dataset.remove_columns(["id"])
train_dataset.set_format('torch', columns=["input_ids", 'attention_mask', 'labels'])
test_dataset = test_dataset.remove_columns(["id"])
test_dataset.set_format('torch', columns=["input_ids", 'attention_mask', 'labels'])

# s3 key prefix for the data
s3_prefix = 'datasets/ge_100'

import botocore
from datasets.filesystems import S3FileSystem

s3 = S3FileSystem()  

training_input_path = f's3://{sess.default_bucket()}/{s3_prefix}/train'
test_input_path = f's3://{sess.default_bucket()}/{s3_prefix}/test'

# save train_dataset to s3
train_dataset.save_to_disk(training_input_path,fs=s3)
test_dataset.save_to_disk(test_input_path,fs=s3)

import botocore
from datasets.filesystems import S3FileSystem

s3 = S3FileSystem()  
training_input_path = f's3://{sess.default_bucket()}/{s3_prefix}/train'
test_input_path = f's3://{sess.default_bucket()}/{s3_prefix}/test'