Covid-19 - TPU V3-1024 - T5 11B: Tensorflow to Pytorch conversion failed

We are training a large scale T5-11B model using TPU V3-1024 for a Covid-19 project
We tried to convert the TensorFlow checkpoint to the Pytorch version, but it did fail.
Could you please help us to figure out the problem since this model is very important for Covid-19 research.

:bug: Bug

Information

Model I am using (Bert, XLNet …):
T5

Language I am using the model on (English, Chinese …):
Protein Sequences

The problem arises when using:

  • [x] the official example scripts: (give details below)
  • [ ] my own modified scripts: (give details below)

The tasks I am working on is:

  • [ ] an official GLUE/SQUaD task: (give the name)
  • [x] my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

  1. The config file:
{
  "architectures": [
    "T5WithLMHeadModel"
  ],
  "d_ff": 65536,
  "d_kv": 128,
  "d_model": 1024,
  "decoder_start_token_id": 0,
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "layer_norm_epsilon": 1e-06,
  "model_type": "t5",
  "n_positions": 512,
  "num_heads": 128,
  "num_layers": 24,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_num_buckets": 32,
  "vocab_size": 128
}
  1. conversion command:
python convert_t5_original_tf_checkpoint_to_pytorch.py \
  --tf_checkpoint_path xxx/tensorflow \
  --config_file xxx/t5-11b-config.json \
  --pytorch_dump_path xxx/pytorch
  1. Error:
Building PyTorch model from configuration: T5Config {
  "architectures": [
    "T5WithLMHeadModel"
  ],
  "d_ff": 65536,
  "d_kv": 128,
  "d_model": 1024,
  "decoder_start_token_id": 0,
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "layer_norm_epsilon": 1e-06,
  "model_type": "t5",
  "n_positions": 512,
  "num_heads": 128,
  "num_layers": 24,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_num_buckets": 32,
  "vocab_size": 128
}

.....


INFO:transformers.modeling_t5:Loading TF weight encoder/block_016/layer_000/SelfAttention/v_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_016/layer_000/SelfAttention/v_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_016/layer_000/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_016/layer_000/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_016/layer_001/DenseReluDense/wi/kernel with shape [1024, 65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_016/layer_001/DenseReluDense/wi/kernel_slot_vc with shape [65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_016/layer_001/DenseReluDense/wi/kernel_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_016/layer_001/DenseReluDense/wo/kernel with shape [65536, 1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_016/layer_001/DenseReluDense/wo/kernel_slot_vc with shape [65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_016/layer_001/DenseReluDense/wo/kernel_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_016/layer_001/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_016/layer_001/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_000/SelfAttention/k with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_000/SelfAttention/k_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_000/SelfAttention/k_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_000/SelfAttention/o with shape [16384, 1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_000/SelfAttention/o_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_000/SelfAttention/o_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_000/SelfAttention/q with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_000/SelfAttention/q_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_000/SelfAttention/q_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_000/SelfAttention/v with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_000/SelfAttention/v_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_000/SelfAttention/v_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_000/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_000/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_001/DenseReluDense/wi/kernel with shape [1024, 65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_001/DenseReluDense/wi/kernel_slot_vc with shape [65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_001/DenseReluDense/wi/kernel_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_001/DenseReluDense/wo/kernel with shape [65536, 1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_001/DenseReluDense/wo/kernel_slot_vc with shape [65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_001/DenseReluDense/wo/kernel_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_001/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_001/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_000/SelfAttention/k with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_000/SelfAttention/k_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_000/SelfAttention/k_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_000/SelfAttention/o with shape [16384, 1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_000/SelfAttention/o_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_000/SelfAttention/o_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_000/SelfAttention/q with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_000/SelfAttention/q_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_000/SelfAttention/q_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_000/SelfAttention/v with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_000/SelfAttention/v_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_000/SelfAttention/v_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_000/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_000/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_001/DenseReluDense/wi/kernel with shape [1024, 65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_001/DenseReluDense/wi/kernel_slot_vc with shape [65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_001/DenseReluDense/wi/kernel_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_001/DenseReluDense/wo/kernel with shape [65536, 1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_001/DenseReluDense/wo/kernel_slot_vc with shape [65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_001/DenseReluDense/wo/kernel_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_001/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_001/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_000/SelfAttention/k with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_000/SelfAttention/k_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_000/SelfAttention/k_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_000/SelfAttention/o with shape [16384, 1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_000/SelfAttention/o_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_000/SelfAttention/o_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_000/SelfAttention/q with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_000/SelfAttention/q_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_000/SelfAttention/q_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_000/SelfAttention/v with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_000/SelfAttention/v_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_000/SelfAttention/v_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_000/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_000/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_001/DenseReluDense/wi/kernel with shape [1024, 65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_001/DenseReluDense/wi/kernel_slot_vc with shape [65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_001/DenseReluDense/wi/kernel_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_001/DenseReluDense/wo/kernel with shape [65536, 1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_001/DenseReluDense/wo/kernel_slot_vc with shape [65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_001/DenseReluDense/wo/kernel_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_001/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_001/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_000/SelfAttention/k with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_000/SelfAttention/k_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_000/SelfAttention/k_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_000/SelfAttention/o with shape [16384, 1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_000/SelfAttention/o_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_000/SelfAttention/o_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_000/SelfAttention/q with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_000/SelfAttention/q_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_000/SelfAttention/q_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_000/SelfAttention/v with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_000/SelfAttention/v_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_000/SelfAttention/v_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_000/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_000/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_001/DenseReluDense/wi/kernel with shape [1024, 65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_001/DenseReluDense/wi/kernel_slot_vc with shape [65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_001/DenseReluDense/wi/kernel_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_001/DenseReluDense/wo/kernel with shape [65536, 1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_001/DenseReluDense/wo/kernel_slot_vc with shape [65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_001/DenseReluDense/wo/kernel_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_001/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_001/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_000/SelfAttention/k with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_000/SelfAttention/k_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_000/SelfAttention/k_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_000/SelfAttention/o with shape [16384, 1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_000/SelfAttention/o_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_000/SelfAttention/o_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_000/SelfAttention/q with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_000/SelfAttention/q_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_000/SelfAttention/q_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_000/SelfAttention/v with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_000/SelfAttention/v_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_000/SelfAttention/v_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_000/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_000/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_001/DenseReluDense/wi/kernel with shape [1024, 65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_001/DenseReluDense/wi/kernel_slot_vc with shape [65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_001/DenseReluDense/wi/kernel_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_001/DenseReluDense/wo/kernel with shape [65536, 1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_001/DenseReluDense/wo/kernel_slot_vc with shape [65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_001/DenseReluDense/wo/kernel_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_001/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_001/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_000/SelfAttention/k with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_000/SelfAttention/k_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_000/SelfAttention/k_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_000/SelfAttention/o with shape [16384, 1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_000/SelfAttention/o_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_000/SelfAttention/o_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_000/SelfAttention/q with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_000/SelfAttention/q_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_000/SelfAttention/q_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_000/SelfAttention/v with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_000/SelfAttention/v_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_000/SelfAttention/v_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_000/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_000/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_001/DenseReluDense/wi/kernel with shape [1024, 65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_001/DenseReluDense/wi/kernel_slot_vc with shape [65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_001/DenseReluDense/wi/kernel_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_001/DenseReluDense/wo/kernel with shape [65536, 1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_001/DenseReluDense/wo/kernel_slot_vc with shape [65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_001/DenseReluDense/wo/kernel_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_001/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_001/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_000/SelfAttention/k with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_000/SelfAttention/k_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_000/SelfAttention/k_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_000/SelfAttention/o with shape [16384, 1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_000/SelfAttention/o_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_000/SelfAttention/o_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_000/SelfAttention/q with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_000/SelfAttention/q_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_000/SelfAttention/q_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_000/SelfAttention/v with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_000/SelfAttention/v_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_000/SelfAttention/v_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_000/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_000/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_001/DenseReluDense/wi/kernel with shape [1024, 65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_001/DenseReluDense/wi/kernel_slot_vc with shape [65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_001/DenseReluDense/wi/kernel_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_001/DenseReluDense/wo/kernel with shape [65536, 1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_001/DenseReluDense/wo/kernel_slot_vc with shape [65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_001/DenseReluDense/wo/kernel_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_001/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_001/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight global_step with shape []
INFO:transformers.modeling_t5:Loading TF weight shared/embedding with shape [128, 1024]
INFO:transformers.modeling_t5:Loading TF weight shared/embedding_slot_vc with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight shared/embedding_slot_vr with shape [128]
INFO:transformers.modeling_t5:Transposing numpy weight of shape (1024, 16384) for ['decoder', 'block_000', 'layer_000', 'SelfAttention', 'k']
INFO:transformers.modeling_t5:Initialize PyTorch weight ['decoder', 'block_000', 'layer_000', 'SelfAttention', 'k']
INFO:transformers.modeling_t5:Skipping decoder/block_000/layer_000/SelfAttention/k_slot_vc
INFO:transformers.modeling_t5:Skipping decoder/block_000/layer_000/SelfAttention/k_slot_vr
INFO:transformers.modeling_t5:Transposing numpy weight of shape (16384, 1024) for ['decoder', 'block_000', 'layer_000', 'SelfAttention', 'o']
INFO:transformers.modeling_t5:Initialize PyTorch weight ['decoder', 'block_000', 'layer_000', 'SelfAttention', 'o']
INFO:transformers.modeling_t5:Skipping decoder/block_000/layer_000/SelfAttention/o_slot_vc
INFO:transformers.modeling_t5:Skipping decoder/block_000/layer_000/SelfAttention/o_slot_vr
INFO:transformers.modeling_t5:Transposing numpy weight of shape (1024, 16384) for ['decoder', 'block_000', 'layer_000', 'SelfAttention', 'q']
INFO:transformers.modeling_t5:Initialize PyTorch weight ['decoder', 'block_000', 'layer_000', 'SelfAttention', 'q']
INFO:transformers.modeling_t5:Skipping decoder/block_000/layer_000/SelfAttention/q_slot_vc
INFO:transformers.modeling_t5:Skipping decoder/block_000/layer_000/SelfAttention/q_slot_vr
INFO:transformers.modeling_t5:Transposing numpy weight of shape (128, 32) for ['decoder', 'block_000', 'layer_000', 'SelfAttention', 'relative_attention_bias']
INFO:transformers.modeling_t5:Initialize PyTorch weight ['decoder', 'block_000', 'layer_000', 'SelfAttention', 'relative_attention_bias']
INFO:transformers.modeling_t5:Skipping decoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v
INFO:transformers.modeling_t5:Transposing numpy weight of shape (1024, 16384) for ['decoder', 'block_000', 'layer_000', 'SelfAttention', 'v']
INFO:transformers.modeling_t5:Initialize PyTorch weight ['decoder', 'block_000', 'layer_000', 'SelfAttention', 'v']
INFO:transformers.modeling_t5:Skipping decoder/block_000/layer_000/SelfAttention/v_slot_vc
INFO:transformers.modeling_t5:Skipping decoder/block_000/layer_000/SelfAttention/v_slot_vr
INFO:transformers.modeling_t5:Skipping decoder/block_000/layer_000/rms_norm/scale
Traceback (most recent call last):
  File "xxx/convert_t5_original_tf_checkpoint_to_pytorch.py", line 61, in <module>
    convert_tf_checkpoint_to_pytorch(args.tf_checkpoint_path, args.config_file, args.pytorch_dump_path)
  File "xxx/convert_t5_original_tf_checkpoint_to_pytorch.py", line 36, in convert_tf_checkpoint_to_$ytorch
    load_tf_weights_in_t5(model, config, tf_checkpoint_path)
  File "xxx/modeling_t5.py", line 102, in load_tf_weights_in_t5
    pointer = getattr(pointer, "weight")
  File "xxx/anaconda3/envs/transformers/lib/python3.7/site-packages/torch/nn/modules/module.py", line 594, in __getattr__
    type(self).__name__, name))
AttributeError: 'T5LayerSelfAttention' object has no attribute 'weight'

Expected behavior

T5 tensorflow model should be converted to pytorch model.

Environment info

  • transformers version: 2.11.0
  • Platform: Linux-4.15.0-101-generic-x86_64-with-debian-buster-sid
  • Python version: 3.7.7
  • PyTorch version (GPU?): 1.5.0 (False)
  • Tensorflow version (GPU?): 2.2.0 (False)
  • Using GPU in script?: No
  • Using distributed or parallel set-up in script?: No
1 Like

hi, recently i had the same problem and solve it by converting .h5 file to ckpt and it works fine now,
can you share a notebook of your training approach with TPU?
you use kaggle or google cloud or google colab?
cuz each platform has its own configuration
so I would help much better if you can share the train notebook