ValueError: Query/Key/Value should either all have the same dtype, or (in the quantized case) Key/Value should have dtype torch.int32

khayamgondal · November 25, 2023, 2:17pm

-enable_xformers_memory_efficient_attention fails with the following error when run with accelerate

  File "/anaconda/envs/diffusers-ikin/lib/python3.8/site-packages/xformers/ops/fmha/__init__.py", line 348, in _memory_efficient_attention_forward_requires_grad
    inp.validate_inputs()
  File "/anaconda/envs/diffusers-ikin/lib/python3.8/site-packages/xformers/ops/fmha/common.py", line 121, in validate_inputs
    raise ValueError(
ValueError: Query/Key/Value should either all have the same dtype, or (in the quantized case) Key/Value should have dtype torch.int32
  query.dtype: torch.float32
  key.dtype  : torch.float16
  value.dtype: torch.float16
Steps:   0%|                                           | 0/1000 [00:02<?, ?it/s]
[2023-11-25 14:03:17,898] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 5264) of binary: /anaconda/envs/diffusers-ikin/bin/python
Traceback (most recent call last):
  File "/anaconda/envs/diffusers-ikin/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/anaconda/envs/diffusers-ikin/lib/python3.8/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/anaconda/envs/diffusers-ikin/lib/python3.8/site-packages/accelerate/commands/launch.py", line 985, in launch_command
    multi_gpu_launcher(args)
  File "/anaconda/envs/diffusers-ikin/lib/python3.8/site-packages/accelerate/commands/launch.py", line 654, in multi_gpu_launcher
    distrib_run.run(args)
  File "/anaconda/envs/diffusers-ikin/lib/python3.8/site-packages/torch/distributed/run.py", line 797, in run
    elastic_launch(
  File "/anaconda/envs/diffusers-ikin/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/anaconda/envs/diffusers-ikin/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py FAILED

I am running accelerate as following

accelerate launch diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py \
      --pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0" \
      --instance_data_dir={input_dir} \
      --output_dir={output_dir} \
      --instance_prompt=instance_prompt \
      --mixed_precision="fp16" \
      --resolution=1024 \
      --train_batch_size=1 \
      --gradient_accumulation_steps=4 \
      --learning_rate=1e-4 \
      --lr_scheduler="constant" \
      --lr_warmup_steps=0 \
      --checkpointing_steps=500 \
      --max_train_steps=1000 \
      --seed="0" \
      --checkpoints_total_limit=5 \
      --enable_xformers_memory_efficient_attention

Without enable_xformers_memory_efficient_attention flag training works fine
Accelerate config

{
  "compute_environment": "LOCAL_MACHINE",
  "debug": false,
  "distributed_type": "MULTI_GPU",
  "downcast_bf16": false,
  "machine_rank": 0,
  "main_training_function": "main",
  "mixed_precision": "no",
  "num_machines": 1,
  "num_processes": 2,
  "rdzv_backend": "static",
  "same_network": false,
  "tpu_use_cluster": false,
  "tpu_use_sudo": false,
  "use_cpu": false
}

Versions:

xformers==0.0.23.dev687
accelerate==0.24.1
torch==2.1.0
torchvision==0.16.1

attashe · December 3, 2023, 7:56am

In the Automatic1111 webui there is option “Upcast cross attention layer to float32” with this code:

    dtype = q.dtype
    if shared.opts.upcast_attn:
        q, k, v = q.float(), k.float(), v.float()
    q = q.contiguous()
    k = k.contiguous()
    v = v.contiguous()
    out = torch.nn.functional.scaled_dot_product_attention(q, k, v, dropout_p=0.0, is_causal=False)
    out = out.to(dtype)

Link to fix:

github.com/AUTOMATIC1111/stable-diffusion-webui

Fix upcast attention dtype error.

committed 08:45PM - 06 Jun 23 UTC

aljungberg

+1 -1

Without this fix, enabling the "Upcast cross attention layer to float32" option …while also using `--opt-sdp-attention` breaks generation with an error: ``` File "/ext3/automatic1111/stable-diffusion-webui/modules/sd_hijack_optimizations.py", line 612, in sdp_attnblock_forward out = torch.nn.functional.scaled_dot_product_attention(q, k, v, dropout_p=0.0, is_causal=False) RuntimeError: Expected query, key, and value to have the same dtype, but got query.dtype: float key.dtype: float and value.dtype: c10::Half instead. ``` The fix is to make sure to upcast the value tensor too.

Topic		Replies	Views
Confused with setting up torch_dtype while using CPU as device 🤗Transformers	0	2336	October 12, 2022
Error 'expected scalar type Half but found Float' 🧨 Diffusers	2	3925	November 8, 2022
RuntimeError: Input type (CPUBFloat16Type) and weight type (torch.DoubleTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor Models	0	1134	October 12, 2022
RuntimeError: mixed dtype (CPU): expect parameter to have scalar type of Float Models	0	2638	October 11, 2022
Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! 🤗Accelerate	1	894	September 20, 2023

ValueError: Query/Key/Value should either all have the same dtype, or (in the quantized case) Key/Value should have dtype torch.int32

Related topics