Covid-19 - TPU V3-1024 - T5 11B: Tensorflow to Pytorch conversion failed

agemagician · July 22, 2020, 1:04pm

We are training a large scale T5-11B model using TPU V3-1024 for a Covid-19 project
We tried to convert the TensorFlow checkpoint to the Pytorch version, but it did fail.
Could you please help us to figure out the problem since this model is very important for Covid-19 research.

Bug

Information

Model I am using (Bert, XLNet …):
T5

Language I am using the model on (English, Chinese …):
Protein Sequences

The problem arises when using:

[x] the official example scripts: (give details below)
[ ] my own modified scripts: (give details below)

The tasks I am working on is:

[ ] an official GLUE/SQUaD task: (give the name)
[x] my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

The config file:

{
  "architectures": [
    "T5WithLMHeadModel"
  ],
  "d_ff": 65536,
  "d_kv": 128,
  "d_model": 1024,
  "decoder_start_token_id": 0,
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "layer_norm_epsilon": 1e-06,
  "model_type": "t5",
  "n_positions": 512,
  "num_heads": 128,
  "num_layers": 24,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_num_buckets": 32,
  "vocab_size": 128
}

conversion command:

python convert_t5_original_tf_checkpoint_to_pytorch.py \
  --tf_checkpoint_path xxx/tensorflow \
  --config_file xxx/t5-11b-config.json \
  --pytorch_dump_path xxx/pytorch

Error:

Building PyTorch model from configuration: T5Config {
  "architectures": [
    "T5WithLMHeadModel"
  ],
  "d_ff": 65536,
  "d_kv": 128,
  "d_model": 1024,
  "decoder_start_token_id": 0,
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "layer_norm_epsilon": 1e-06,
  "model_type": "t5",
  "n_positions": 512,
  "num_heads": 128,
  "num_layers": 24,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_num_buckets": 32,
  "vocab_size": 128
}

.....


INFO:transformers.modeling_t5:Loading TF weight encoder/block_016/layer_000/SelfAttention/v_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_016/layer_000/SelfAttention/v_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_016/layer_000/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_016/layer_000/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_016/layer_001/DenseReluDense/wi/kernel with shape [1024, 65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_016/layer_001/DenseReluDense/wi/kernel_slot_vc with shape [65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_016/layer_001/DenseReluDense/wi/kernel_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_016/layer_001/DenseReluDense/wo/kernel with shape [65536, 1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_016/layer_001/DenseReluDense/wo/kernel_slot_vc with shape [65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_016/layer_001/DenseReluDense/wo/kernel_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_016/layer_001/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_016/layer_001/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_000/SelfAttention/k with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_000/SelfAttention/k_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_000/SelfAttention/k_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_000/SelfAttention/o with shape [16384, 1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_000/SelfAttention/o_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_000/SelfAttention/o_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_000/SelfAttention/q with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_000/SelfAttention/q_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_000/SelfAttention/q_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_000/SelfAttention/v with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_000/SelfAttention/v_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_000/SelfAttention/v_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_000/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_000/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_001/DenseReluDense/wi/kernel with shape [1024, 65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_001/DenseReluDense/wi/kernel_slot_vc with shape [65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_001/DenseReluDense/wi/kernel_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_001/DenseReluDense/wo/kernel with shape [65536, 1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_001/DenseReluDense/wo/kernel_slot_vc with shape [65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_001/DenseReluDense/wo/kernel_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_001/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_017/layer_001/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_000/SelfAttention/k with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_000/SelfAttention/k_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_000/SelfAttention/k_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_000/SelfAttention/o with shape [16384, 1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_000/SelfAttention/o_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_000/SelfAttention/o_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_000/SelfAttention/q with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_000/SelfAttention/q_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_000/SelfAttention/q_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_000/SelfAttention/v with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_000/SelfAttention/v_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_000/SelfAttention/v_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_000/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_000/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_001/DenseReluDense/wi/kernel with shape [1024, 65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_001/DenseReluDense/wi/kernel_slot_vc with shape [65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_001/DenseReluDense/wi/kernel_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_001/DenseReluDense/wo/kernel with shape [65536, 1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_001/DenseReluDense/wo/kernel_slot_vc with shape [65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_001/DenseReluDense/wo/kernel_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_001/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_018/layer_001/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_000/SelfAttention/k with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_000/SelfAttention/k_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_000/SelfAttention/k_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_000/SelfAttention/o with shape [16384, 1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_000/SelfAttention/o_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_000/SelfAttention/o_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_000/SelfAttention/q with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_000/SelfAttention/q_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_000/SelfAttention/q_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_000/SelfAttention/v with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_000/SelfAttention/v_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_000/SelfAttention/v_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_000/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_000/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_001/DenseReluDense/wi/kernel with shape [1024, 65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_001/DenseReluDense/wi/kernel_slot_vc with shape [65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_001/DenseReluDense/wi/kernel_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_001/DenseReluDense/wo/kernel with shape [65536, 1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_001/DenseReluDense/wo/kernel_slot_vc with shape [65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_001/DenseReluDense/wo/kernel_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_001/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_019/layer_001/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_000/SelfAttention/k with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_000/SelfAttention/k_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_000/SelfAttention/k_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_000/SelfAttention/o with shape [16384, 1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_000/SelfAttention/o_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_000/SelfAttention/o_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_000/SelfAttention/q with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_000/SelfAttention/q_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_000/SelfAttention/q_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_000/SelfAttention/v with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_000/SelfAttention/v_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_000/SelfAttention/v_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_000/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_000/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_001/DenseReluDense/wi/kernel with shape [1024, 65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_001/DenseReluDense/wi/kernel_slot_vc with shape [65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_001/DenseReluDense/wi/kernel_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_001/DenseReluDense/wo/kernel with shape [65536, 1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_001/DenseReluDense/wo/kernel_slot_vc with shape [65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_001/DenseReluDense/wo/kernel_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_001/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_020/layer_001/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_000/SelfAttention/k with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_000/SelfAttention/k_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_000/SelfAttention/k_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_000/SelfAttention/o with shape [16384, 1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_000/SelfAttention/o_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_000/SelfAttention/o_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_000/SelfAttention/q with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_000/SelfAttention/q_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_000/SelfAttention/q_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_000/SelfAttention/v with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_000/SelfAttention/v_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_000/SelfAttention/v_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_000/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_000/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_001/DenseReluDense/wi/kernel with shape [1024, 65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_001/DenseReluDense/wi/kernel_slot_vc with shape [65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_001/DenseReluDense/wi/kernel_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_001/DenseReluDense/wo/kernel with shape [65536, 1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_001/DenseReluDense/wo/kernel_slot_vc with shape [65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_001/DenseReluDense/wo/kernel_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_001/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_021/layer_001/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_000/SelfAttention/k with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_000/SelfAttention/k_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_000/SelfAttention/k_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_000/SelfAttention/o with shape [16384, 1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_000/SelfAttention/o_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_000/SelfAttention/o_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_000/SelfAttention/q with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_000/SelfAttention/q_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_000/SelfAttention/q_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_000/SelfAttention/v with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_000/SelfAttention/v_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_000/SelfAttention/v_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_000/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_000/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_001/DenseReluDense/wi/kernel with shape [1024, 65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_001/DenseReluDense/wi/kernel_slot_vc with shape [65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_001/DenseReluDense/wi/kernel_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_001/DenseReluDense/wo/kernel with shape [65536, 1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_001/DenseReluDense/wo/kernel_slot_vc with shape [65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_001/DenseReluDense/wo/kernel_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_001/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_022/layer_001/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_000/SelfAttention/k with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_000/SelfAttention/k_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_000/SelfAttention/k_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_000/SelfAttention/o with shape [16384, 1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_000/SelfAttention/o_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_000/SelfAttention/o_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_000/SelfAttention/q with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_000/SelfAttention/q_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_000/SelfAttention/q_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_000/SelfAttention/v with shape [1024, 16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_000/SelfAttention/v_slot_vc with shape [16384]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_000/SelfAttention/v_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_000/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_000/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_001/DenseReluDense/wi/kernel with shape [1024, 65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_001/DenseReluDense/wi/kernel_slot_vc with shape [65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_001/DenseReluDense/wi/kernel_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_001/DenseReluDense/wo/kernel with shape [65536, 1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_001/DenseReluDense/wo/kernel_slot_vc with shape [65536]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_001/DenseReluDense/wo/kernel_slot_vr with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_001/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/block_023/layer_001/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/rms_norm/scale with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight encoder/rms_norm/scale_slot_v with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight global_step with shape []
INFO:transformers.modeling_t5:Loading TF weight shared/embedding with shape [128, 1024]
INFO:transformers.modeling_t5:Loading TF weight shared/embedding_slot_vc with shape [1024]
INFO:transformers.modeling_t5:Loading TF weight shared/embedding_slot_vr with shape [128]
INFO:transformers.modeling_t5:Transposing numpy weight of shape (1024, 16384) for ['decoder', 'block_000', 'layer_000', 'SelfAttention', 'k']
INFO:transformers.modeling_t5:Initialize PyTorch weight ['decoder', 'block_000', 'layer_000', 'SelfAttention', 'k']
INFO:transformers.modeling_t5:Skipping decoder/block_000/layer_000/SelfAttention/k_slot_vc
INFO:transformers.modeling_t5:Skipping decoder/block_000/layer_000/SelfAttention/k_slot_vr
INFO:transformers.modeling_t5:Transposing numpy weight of shape (16384, 1024) for ['decoder', 'block_000', 'layer_000', 'SelfAttention', 'o']
INFO:transformers.modeling_t5:Initialize PyTorch weight ['decoder', 'block_000', 'layer_000', 'SelfAttention', 'o']
INFO:transformers.modeling_t5:Skipping decoder/block_000/layer_000/SelfAttention/o_slot_vc
INFO:transformers.modeling_t5:Skipping decoder/block_000/layer_000/SelfAttention/o_slot_vr
INFO:transformers.modeling_t5:Transposing numpy weight of shape (1024, 16384) for ['decoder', 'block_000', 'layer_000', 'SelfAttention', 'q']
INFO:transformers.modeling_t5:Initialize PyTorch weight ['decoder', 'block_000', 'layer_000', 'SelfAttention', 'q']
INFO:transformers.modeling_t5:Skipping decoder/block_000/layer_000/SelfAttention/q_slot_vc
INFO:transformers.modeling_t5:Skipping decoder/block_000/layer_000/SelfAttention/q_slot_vr
INFO:transformers.modeling_t5:Transposing numpy weight of shape (128, 32) for ['decoder', 'block_000', 'layer_000', 'SelfAttention', 'relative_attention_bias']
INFO:transformers.modeling_t5:Initialize PyTorch weight ['decoder', 'block_000', 'layer_000', 'SelfAttention', 'relative_attention_bias']
INFO:transformers.modeling_t5:Skipping decoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v
INFO:transformers.modeling_t5:Transposing numpy weight of shape (1024, 16384) for ['decoder', 'block_000', 'layer_000', 'SelfAttention', 'v']
INFO:transformers.modeling_t5:Initialize PyTorch weight ['decoder', 'block_000', 'layer_000', 'SelfAttention', 'v']
INFO:transformers.modeling_t5:Skipping decoder/block_000/layer_000/SelfAttention/v_slot_vc
INFO:transformers.modeling_t5:Skipping decoder/block_000/layer_000/SelfAttention/v_slot_vr
INFO:transformers.modeling_t5:Skipping decoder/block_000/layer_000/rms_norm/scale
Traceback (most recent call last):
  File "xxx/convert_t5_original_tf_checkpoint_to_pytorch.py", line 61, in <module>
    convert_tf_checkpoint_to_pytorch(args.tf_checkpoint_path, args.config_file, args.pytorch_dump_path)
  File "xxx/convert_t5_original_tf_checkpoint_to_pytorch.py", line 36, in convert_tf_checkpoint_to_$ytorch
    load_tf_weights_in_t5(model, config, tf_checkpoint_path)
  File "xxx/modeling_t5.py", line 102, in load_tf_weights_in_t5
    pointer = getattr(pointer, "weight")
  File "xxx/anaconda3/envs/transformers/lib/python3.7/site-packages/torch/nn/modules/module.py", line 594, in __getattr__
    type(self).__name__, name))
AttributeError: 'T5LayerSelfAttention' object has no attribute 'weight'

Expected behavior

T5 tensorflow model should be converted to pytorch model.

Environment info

transformers version: 2.11.0
Platform: Linux-4.15.0-101-generic-x86_64-with-debian-buster-sid
Python version: 3.7.7
PyTorch version (GPU?): 1.5.0 (False)
Tensorflow version (GPU?): 2.2.0 (False)
Using GPU in script?: No
Using distributed or parallel set-up in script?: No

saratayylor · January 3, 2023, 10:56pm

hi, recently i had the same problem and solve it by converting .h5 file to ckpt and it works fine now,
can you share a notebook of your training approach with TPU?
you use kaggle or google cloud or google colab?
cuz each platform has its own configuration
so I would help much better if you can share the train notebook

Topic		Replies	Views
How to convert T5X model into a PyTorch model 🤗Transformers	0	510	June 13, 2023
Convert new T5 checkpoints released from Google (NaturalQuestion dataset) 🤗Transformers	3	1487	October 18, 2020
Convert tensorflow tokenclassifier checkpoint to pytorch 🤗Transformers	2	906	January 2, 2022
Converting TF-Bert to Torch using conversion script works, but Beginners	4	765	July 23, 2021
Issue with converting my own BERT TF2 checkpoint to PyTorch and loading the PyTorch model for training 🤗Transformers	0	535	February 25, 2021

Covid-19 - TPU V3-1024 - T5 11B: Tensorflow to Pytorch conversion failed

Bug

Information

To reproduce

Expected behavior

Environment info

Related topics