Error message:
[2024-04-25 04:12:22,346] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-25 04:12:24,796] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0]}
[2024-04-25 04:12:24,796] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=1, node_rank=0
[2024-04-25 04:12:24,796] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2024-04-25 04:12:24,796] [INFO] [launch.py:163:main] dist_world_size=1
[2024-04-25 04:12:24,797] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0
[2024-04-25 04:12:27,946] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-25 04:12:28,765] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2024-04-25 04:12:28,765] [INFO] [comm.py:594:init_distributed] cdb=None
[2024-04-25 04:12:28,765] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
/root/anaconda3/envs/muffin/lib/python3.10/site-packages/torch/nn/init.py:405: UserWarning: Initializing zero-element tensors is a no-op
warnings.warn("Initializing zero-element tensors is a no-op")
[2024-04-25 04:12:30,365] [INFO] [partition_parameters.py:453:__exit__] finished initializing model with 12.92B parameters
Traceback (most recent call last):
File "/root/data/yflu/muffin/./muffin/train/debug.py", line 22, in <module>
load()
File "/root/data/yflu/muffin/./muffin/train/debug.py", line 16, in load
model = Beit3LlavaLlamaForCausalLM.from_pretrained(
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2959, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 385, in wrapper
f(module, *args, **kwargs)
File "/root/data/yflu/muffin/muffin/model/muffin.py", line 311, in __init__
self.model = Beit3LlavaLlamaModel(config, mm_vision_tower=mm_vision_tower)
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 385, in wrapper
f(module, *args, **kwargs)
File "/root/data/yflu/muffin/muffin/model/muffin.py", line 153, in __init__
self.vision_tower = timm.create_model(mm_vision_tower)
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/timm/models/factory.py", line 81, in create_model
model = create_fn(pretrained=pretrained, **kwargs)
File "/root/data/yflu/muffin/muffin/model/beit3.py", line 135, in beit3_large_patch16_672
model = BEiT3Wrapper(args, **kwargs)
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 385, in wrapper
f(module, *args, **kwargs)
File "/root/data/yflu/muffin/muffin/model/beit3.py", line 51, in __init__
self.beit3 = BEiT3(args)
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 385, in wrapper
f(module, *args, **kwargs)
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/torchscale/model/BEiT3.py", line 40, in __init__
self.encoder = Encoder(
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 385, in wrapper
f(module, *args, **kwargs)
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/torchscale/architecture/encoder.py", line 209, in __init__
self.build_encoder_layer(
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/torchscale/architecture/encoder.py", line 296, in build_encoder_layer
layer = EncoderLayer(
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 385, in wrapper
f(module, *args, **kwargs)
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/torchscale/architecture/encoder.py", line 30, in __init__
self.self_attn = self.build_self_attention(self.embed_dim, args)
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/torchscale/architecture/encoder.py", line 103, in build_self_attention
return MultiheadAttention(
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 385, in wrapper
f(module, *args, **kwargs)
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/torchscale/component/multihead_attention.py", line 40, in __init__
self.k_proj = MultiwayWrapper(args, nn.Linear(embed_dim, embed_dim, bias=True))
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/torchscale/component/multiway_network.py", line 12, in MultiwayWrapper
return MultiwayNetwork(module, dim=dim)
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 385, in wrapper
f(module, *args, **kwargs)
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/torchscale/component/multiway_network.py", line 30, in __init__
self.B.reset_parameters()
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 109, in reset_parameters
fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/torch/nn/init.py", line 287, in _calculate_fan_in_and_fan_out
raise ValueError("Fan in and fan out can not be computed for tensor with fewer than 2 dimensions")
ValueError: Fan in and fan out can not be computed for tensor with fewer than 2 dimensions
It seems that multihead attention module BEIT3 cannot be initialized. The weight of linear layer self.B
in module MultiwayNetwork
is empty.