Transformers cache not loading from a new vm

jainhitesh9998 · October 11, 2024, 5:44am

We have setup an aws ec2 instance and initialized https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct and mistralai/Pixtral-12B-2409 · Hugging Face models.

when we create ami from these vm and create new instance, the transformers library isn’t able to load the pretrained models.

When i delete .cache in newly created instances, then it’s able to download and load models.

is it an expected beahaviour? as the instance configuration is exactly the same

John6666 · October 11, 2024, 8:00am

The symptoms are different but the problem could be similar to this.

jainhitesh9998 · October 11, 2024, 10:19am

Space doesn’t seem to be issue , I have enough free space on vm. The moment I delete .cache it redownloads and starts working.

John6666 · October 11, 2024, 10:22am

Your error is coming from caching the dataset. Datasets is caching the dataset on disk to work with it properly. The default cache_dir is ~/.cache/huggingface/datasets. This directory seems not to be on the mounted EBS volume.

I think it’s not that way, it’s around here.

jainhitesh9998 · October 11, 2024, 10:35am

I’m not trying to cache dataset, it’s just model weights and the ami has the weights in the root for itself. The primary purpose is to boot the process as fast as possible

John6666 · October 11, 2024, 10:39am

Perhaps unresolved issue?

github.com/huggingface/transformers

Model cache does not work

opened 05:46PM - 21 Feb 24 UTC

closed 08:04AM - 05 Apr 24 UTC

bitcoinvsalts

### System Info I am ubuntu, torch = "2.0.0" The following code always re-do…wnload the models instead of re-using the cached files: ` adapter = T2IAdapter.from_pretrained("TencentARC/t2i-adapter-canny-sdxl-1.0", torch_dtype=torch.float16, varient="fp16").to(accelerator.device) model_id = 'stabilityai/stable-diffusion-xl-base-1.0' logger.info(f"model loading..") euler_a = EulerAncestralDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler") logger.info(f"model loading...") vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16) logger.info(f"model loading....") pipe = StableDiffusionXLAdapterPipeline.from_pretrained( model_id, vae=vae, adapter=adapter, scheduler=euler_a, torch_dtype=torch.float16, variant="fp16" ).to(accelerator.device) pipe.enable_xformers_memory_efficient_attention() logger.info(f"model weights loading") pipe.load_lora_weights( "stabilityai/stable-diffusion-xl-base-1.0", weight_name="sd_xl_offset_example-lora_1.0.safetensors" )` I also tried with the `cache_dir` param but same result. <img width="1377" alt="Screenshot 2024-02-21 at 11 32 22 AM" src="https://github.com/huggingface/transformers/assets/13439477/562c4f1e-b033-4c03-883b-9dcf3b6a579a"> ### Who can help? _No response_ ### Information - [ ] The official example scripts - [ ] My own modified scripts ### Tasks - [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below) ### Reproduction ` adapter = T2IAdapter.from_pretrained("TencentARC/t2i-adapter-canny-sdxl-1.0", torch_dtype=torch.float16, varient="fp16").to(accelerator.device) model_id = 'stabilityai/stable-diffusion-xl-base-1.0' logger.info(f"model loading..") euler_a = EulerAncestralDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler") logger.info(f"model loading...") vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16) logger.info(f"model loading....") pipe = StableDiffusionXLAdapterPipeline.from_pretrained( model_id, vae=vae, adapter=adapter, scheduler=euler_a, torch_dtype=torch.float16, variant="fp16" ).to(accelerator.device) pipe.enable_xformers_memory_efficient_attention() logger.info(f"model weights loading") pipe.load_lora_weights( "stabilityai/stable-diffusion-xl-base-1.0", weight_name="sd_xl_offset_example-lora_1.0.safetensors" )` ### Expected behavior use the cached models/files and not download any files from the internet

I wonder if it’s possible to avoid this by manually clearing the cache before execution.

jainhitesh9998 · October 11, 2024, 10:49am

I have seen this issue in the GitHub issues

I’m not facing this , but in my case it starts loading weights and remains stuck at 0% when I spin up new instance

Topic		Replies	Views
Load model from local cache directory in Sagemaker notebooks Amazon SageMaker	0	461	December 12, 2023
Huggingface Transformer code successfully gets executed on amazon web services but not on other server 🤗Transformers	1	1369	December 20, 2020
Cache large models on GPU instances between reboots Spaces	3	870	February 14, 2023
Load_cache before training 🤗Transformers	0	205	February 28, 2023
Models take all ubuntu free space Beginners	3	2362	May 17, 2021

Transformers cache not loading from a new vm

Related topics