I’m trying to reproduce the codet5 fine-tuning results (GitHub - salesforce/CodeT5: Code for CodeT5: a new code-aware pre-trained encoder-decoder model.)
The script being used is:
python3 /home/ubuntu/CodeT5/run_gen.py \
--task summarize \
--sub_task python \
--summary_dir /home/ubuntu/CodeT5/summary \
--cache_path /home/ubuntu/CodeT5/cache \
--data_dir /home/ubuntu/CodeT5/data \
--res_dir /home/ubuntu/CodeT5/res \
--output_dir /home/ubuntu/CodeT5/output \
--save_last_checkpoints \
--always_save_model \
--do_eval_bleu \
--model_name_or_path='Salesforce/codet5-base-multi-sum' \
--tokenizer_name='Salesforce/codet5-base-multi-sum' \
--train_filename /home/ubuntu/CodeT5/data/summarize/python/train.jsonl \
--dev_filename /home/ubuntu/CodeT5/data/summarize/python/valid.jsonl \
--test_filename /home/ubuntu/CodeT5/data/summarize/python/test.jsonl \
--do_train \
--do_eval \
--do_test \
--save_steps=500 \
--log_steps=100 \
--local_rank=-1
Running it leads to the following error:
Traceback (most recent call last):
File "/home/ubuntu/CodeT5/run_gen.py", line 387, in <module>
main()
File "/home/ubuntu/CodeT5/run_gen.py", line 234, in main
outputs = model(input_ids=source_ids, attention_mask=source_mask,
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 1561, in forward
encoder_outputs = self.encoder(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 998, in forward
layer_outputs = layer_module(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 639, in forward
self_attention_outputs = self.layer[0](
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 546, in forward
attention_output = self.SelfAttention(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 483, in forward
scores = torch.matmul(
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
When run with an extra --no_cuda
flag, ithe training script produces this error:
Traceback (most recent call last):
File "/home/ubuntu/CodeT5/run_gen.py", line 394, in <module>
main()
File "/home/ubuntu/CodeT5/run_gen.py", line 241, in main
outputs = model(input_ids=source_ids, attention_mask=source_mask,
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 1561, in forward
encoder_outputs = self.encoder(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 898, in forward
inputs_embeds = self.embed_tokens(input_ids)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/sparse.py", line 158, in forward
return F.embedding(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py", line 2044, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self
My guess is something fishy is going on with the source_ids
, but I haven’t been able to figure it out.
A simple test shows that source_ids
has shape torch.Size([8, 64])
, while target_ids
have torch.Size([8, 64])
.
I wonder:
- whether (and how) this can be debugged and fixed?
- whether a
Trainer
can be adapted to this fine-tuning task? The main things I expect the Trainer to provide are deepspeed support and wandb reporting integration
UPD: an example summarization script seems to work fine. Perhpas I’ll just have to tweak it a bit so that it can summarize code instead of natural language