Hi all, I’m trying to get textual-inversion fine-tuning using diffusers running.
It consistently trains my embedding to 7 or 8%, at which point the space just stops & restarts without error message at all.
I started monitoring GPU memory in case that was it, doesn’t seem to be.
Any tools or ideas to help debug these kinds of scenarios?
tvf7t 2023-04-06T22:59:27.259Z
tvf7t 2023-04-06T22:59:27.259Z Collecting usage statistics. To deactivate, set browser.gatherUsageStats to False.
tvf7t 2023-04-06T22:59:27.259Z
tvf7t 2023-04-06T22:59:27.350Z
tvf7t 2023-04-06T22:59:27.350Z You can now view your Streamlit app in your browser.
tvf7t 2023-04-06T22:59:27.350Z
tvf7t 2023-04-06T22:59:27.350Z Network URL: http://10.19.71.229:8501
tvf7t 2023-04-06T22:59:27.350Z External URL: http://3.223.72.184:8501
tvf7t 2023-04-06T22:59:27.350Z
tvf7t 2023-04-06T23:01:11.014Z 2023-04-07 01:01:11.013 Name of concept: OUINT4
tvf7t 2023-04-06T23:01:11.014Z 2023-04-07 01:01:11.014 In from_local
tvf7t 2023-04-06T23:01:11.014Z 2023-04-07 01:01:11.014 save_path path: ./training/OUINT4/
tvf7t 2023-04-06T23:01:11.093Z 2023-04-07 01:01:11.093 Setting up model with:
tvf7t 2023-04-06T23:01:11.093Z 2023-04-07 01:01:11.093 Pretrained model name: CompVis/stable-diffusion-v1-4
tvf7t 2023-04-06T23:01:11.093Z 2023-04-07 01:01:11.093 Placeholder token: <OUINT4>
tvf7t 2023-04-06T23:01:11.093Z 2023-04-07 01:01:11.093 Initializer token: shirt
tvf7t 2023-04-06T23:01:11.093Z 2023-04-07 01:01:11.093 Learnable property: object
tvf7t 2023-04-06T23:01:11.093Z 2023-04-07 01:01:11.093 Save to library? False
tvf7t 2023-04-06T23:01:11.093Z 2023-04-07 01:01:11.093 Text encoder? None
tvf7t 2023-04-06T23:01:11.190Z
Downloading (…)tokenizer/vocab.json: 0%| | 0.00/1.06M [00:00<?, ?B/s]
Downloading (…)tokenizer/vocab.json: 100%|██████████| 1.06M/1.06M [00:00<00:00, 90.8MB/s]
tvf7t 2023-04-06T23:01:11.283Z
Downloading (…)tokenizer/merges.txt: 0%| | 0.00/525k [00:00<?, ?B/s]
Downloading (…)tokenizer/merges.txt: 100%|██████████| 525k/525k [00:00<00:00, 77.3MB/s]
tvf7t 2023-04-06T23:01:11.356Z
Downloading (…)cial_tokens_map.json: 0%| | 0.00/472 [00:00<?, ?B/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 472/472 [00:00<00:00, 231kB/s]
tvf7t 2023-04-06T23:01:11.405Z
Downloading (…)okenizer_config.json: 0%| | 0.00/806 [00:00<?, ?B/s]
Downloading (…)okenizer_config.json: 100%|██████████| 806/806 [00:00<00:00, 389kB/s]
tvf7t 2023-04-06T23:01:11.560Z
Downloading (…)_encoder/config.json: 0%| | 0.00/592 [00:00<?, ?B/s]
Downloading (…)_encoder/config.json: 100%|██████████| 592/592 [00:00<00:00, 90.1kB/s]
tvf7t 2023-04-06T23:01:17.891Z
Downloading pytorch_model.bin: 0%| | 0.00/492M [00:00<?, ?B/s]
Downloading pytorch_model.bin: 4%|▍ | 21.0M/492M [00:00<00:05, 82.5MB/s]
Downloading pytorch_model.bin: 6%|▋ | 31.5M/492M [00:00<00:06, 68.3MB/s]
Cutting out some logs to fit within size limit
Downloading (…)on_pytorch_model.bin: 100%|██████████| 335M/335M [00:06<00:00, 48.0MB/s]
tvf7t 2023-04-06T23:01:26.466Z
Downloading (…)main/vae/config.json: 0%| | 0.00/551 [00:00<?, ?B/s]
Downloading (…)main/vae/config.json: 100%|██████████| 551/551 [00:00<00:00, 312kB/s]
tvf7t 2023-04-06T23:02:15.274Z
Downloading (…)on_pytorch_model.bin: 0%| | 0.00/3.44G [00:00<?, ?B/s]
Downloading (…)on_pytorch_model.bin: 1%| | 21.0M/3.44G [00:00<00:38, 88.3MB/s]
Downloading (…)on_pytorch_model.bin: 1%| | 41.9M/3.44G [00:00<00:45, 75.4MB/s]
Downloading (…)on_pytorch_model.bin: 2%|▏ | 62.9M/3.44G [00:00<00:37, 89.8MB/s]
Cutting out some logs to fit within size limit
Downloading (…)on_pytorch_model.bin: 100%|██████████| 3.44G/3.44G [00:54<00:00, 63.0MB/s]
tvf7t 2023-04-06T23:02:21.464Z
Downloading (…)ain/unet/config.json: 0%| | 0.00/743 [00:00<?, ?B/s]
Downloading (…)ain/unet/config.json: 100%|██████████| 743/743 [00:00<00:00, 375kB/s]
tvf7t 2023-04-06T23:02:23.588Z 2023-04-07 01:02:23.588 Resizing token embeddings to 49409
tvf7t 2023-04-06T23:02:23.589Z 2023-04-07 01:02:23.589 text_encoder: CLIPTextModel(
tvf7t 2023-04-06T23:02:23.589Z (text_model): CLIPTextTransformer(
tvf7t 2023-04-06T23:02:23.589Z (embeddings): CLIPTextEmbeddings(
tvf7t 2023-04-06T23:02:23.589Z (token_embedding): Embedding(49408, 768)
tvf7t 2023-04-06T23:02:23.589Z (position_embedding): Embedding(77, 768)
tvf7t 2023-04-06T23:02:23.589Z )
tvf7t 2023-04-06T23:02:23.589Z (encoder): CLIPEncoder(
tvf7t 2023-04-06T23:02:23.589Z (layers): ModuleList(
tvf7t 2023-04-06T23:02:23.589Z (0-11): 12 x CLIPEncoderLayer(
tvf7t 2023-04-06T23:02:23.589Z (self_attn): CLIPAttention(
tvf7t 2023-04-06T23:02:23.589Z (k_proj): Linear(in_features=768, out_features=768, bias=True)
tvf7t 2023-04-06T23:02:23.589Z (v_proj): Linear(in_features=768, out_features=768, bias=True)
tvf7t 2023-04-06T23:02:23.589Z (q_proj): Linear(in_features=768, out_features=768, bias=True)
tvf7t 2023-04-06T23:02:23.589Z (out_proj): Linear(in_features=768, out_features=768, bias=True)
tvf7t 2023-04-06T23:02:23.589Z )
tvf7t 2023-04-06T23:02:23.589Z (layer_norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
tvf7t 2023-04-06T23:02:23.589Z (mlp): CLIPMLP(
tvf7t 2023-04-06T23:02:23.589Z (activation_fn): QuickGELUActivation()
tvf7t 2023-04-06T23:02:23.589Z (fc1): Linear(in_features=768, out_features=3072, bias=True)
tvf7t 2023-04-06T23:02:23.589Z (fc2): Linear(in_features=3072, out_features=768, bias=True)
tvf7t 2023-04-06T23:02:23.589Z )
tvf7t 2023-04-06T23:02:23.589Z (layer_norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
tvf7t 2023-04-06T23:02:23.589Z )
tvf7t 2023-04-06T23:02:23.589Z )
tvf7t 2023-04-06T23:02:23.589Z )
tvf7t 2023-04-06T23:02:23.589Z (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
tvf7t 2023-04-06T23:02:23.589Z )
tvf7t 2023-04-06T23:02:23.589Z )
tvf7t 2023-04-06T23:02:23.941Z /home/user/app/app.py:296: DeprecationWarning: LINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use BILINEAR or Resampling.BILINEAR instead.
tvf7t 2023-04-06T23:02:23.941Z "linear": PIL.Image.LINEAR,
tvf7t 2023-04-06T23:02:23.942Z /home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/diffusers/configuration_utils.py:195: FutureWarning: It is deprecated to pass a pretrained model name or path to `from_config`.If you were trying to load a scheduler, please use <class 'diffusers.schedulers.scheduling_ddpm.DDPMScheduler'>.from_pretrained(...) instead. Otherwise, please make sure to pass a configuration dictionary instead. This functionality will be removed in v1.0.0.
tvf7t 2023-04-06T23:02:23.942Z deprecate("config-passed-as-path", "1.0.0", deprecation_message, standard_warn=False)
tvf7t 2023-04-06T23:02:24.010Z
Downloading (…)cheduler_config.json: 0%| | 0.00/313 [00:00<?, ?B/s]
Downloading (…)cheduler_config.json: 100%|██████████| 313/313 [00:00<00:00, 46.6kB/s]
tvf7t 2023-04-06T23:02:24.021Z 2023-04-07 01:02:24.021 Exiting setup_model with text_encoder: CLIPTextModel(
tvf7t 2023-04-06T23:02:24.021Z (text_model): CLIPTextTransformer(
tvf7t 2023-04-06T23:02:24.021Z (embeddings): CLIPTextEmbeddings(
tvf7t 2023-04-06T23:02:24.021Z (token_embedding): Embedding(49409, 768)
tvf7t 2023-04-06T23:02:24.021Z (position_embedding): Embedding(77, 768)
tvf7t 2023-04-06T23:02:24.021Z )
tvf7t 2023-04-06T23:02:24.021Z (encoder): CLIPEncoder(
tvf7t 2023-04-06T23:02:24.021Z (layers): ModuleList(
tvf7t 2023-04-06T23:02:24.021Z (0-11): 12 x CLIPEncoderLayer(
tvf7t 2023-04-06T23:02:24.021Z (self_attn): CLIPAttention(
tvf7t 2023-04-06T23:02:24.021Z (k_proj): Linear(in_features=768, out_features=768, bias=True)
tvf7t 2023-04-06T23:02:24.021Z (v_proj): Linear(in_features=768, out_features=768, bias=True)
tvf7t 2023-04-06T23:02:24.021Z (q_proj): Linear(in_features=768, out_features=768, bias=True)
tvf7t 2023-04-06T23:02:24.021Z (out_proj): Linear(in_features=768, out_features=768, bias=True)
tvf7t 2023-04-06T23:02:24.021Z )
tvf7t 2023-04-06T23:02:24.021Z (layer_norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
tvf7t 2023-04-06T23:02:24.021Z (mlp): CLIPMLP(
tvf7t 2023-04-06T23:02:24.021Z (activation_fn): QuickGELUActivation()
tvf7t 2023-04-06T23:02:24.021Z (fc1): Linear(in_features=768, out_features=3072, bias=True)
tvf7t 2023-04-06T23:02:24.021Z (fc2): Linear(in_features=3072, out_features=768, bias=True)
tvf7t 2023-04-06T23:02:24.021Z )
tvf7t 2023-04-06T23:02:24.021Z (layer_norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
tvf7t 2023-04-06T23:02:24.021Z )
tvf7t 2023-04-06T23:02:24.021Z )
tvf7t 2023-04-06T23:02:24.021Z )
tvf7t 2023-04-06T23:02:24.021Z (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
tvf7t 2023-04-06T23:02:24.021Z )
tvf7t 2023-04-06T23:02:24.021Z )
tvf7t 2023-04-06T23:02:24.021Z 2023-04-07 01:02:24.021 ***** Running training *****
tvf7t 2023-04-06T23:02:24.022Z 2023-04-07 01:02:24.021 global_text_encoder: CLIPTextModel(
tvf7t 2023-04-06T23:02:24.022Z (text_model): CLIPTextTransformer(
tvf7t 2023-04-06T23:02:24.022Z (embeddings): CLIPTextEmbeddings(
tvf7t 2023-04-06T23:02:24.022Z (token_embedding): Embedding(49409, 768)
tvf7t 2023-04-06T23:02:24.022Z (position_embedding): Embedding(77, 768)
tvf7t 2023-04-06T23:02:24.022Z )
tvf7t 2023-04-06T23:02:24.022Z (encoder): CLIPEncoder(
tvf7t 2023-04-06T23:02:24.022Z (layers): ModuleList(
tvf7t 2023-04-06T23:02:24.022Z (0-11): 12 x CLIPEncoderLayer(
tvf7t 2023-04-06T23:02:24.022Z (self_attn): CLIPAttention(
tvf7t 2023-04-06T23:02:24.022Z (k_proj): Linear(in_features=768, out_features=768, bias=True)
tvf7t 2023-04-06T23:02:24.022Z (v_proj): Linear(in_features=768, out_features=768, bias=True)
tvf7t 2023-04-06T23:02:24.022Z (q_proj): Linear(in_features=768, out_features=768, bias=True)
tvf7t 2023-04-06T23:02:24.022Z (out_proj): Linear(in_features=768, out_features=768, bias=True)
tvf7t 2023-04-06T23:02:24.022Z )
tvf7t 2023-04-06T23:02:24.022Z (layer_norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
tvf7t 2023-04-06T23:02:24.022Z (mlp): CLIPMLP(
tvf7t 2023-04-06T23:02:24.022Z (activation_fn): QuickGELUActivation()
tvf7t 2023-04-06T23:02:24.022Z (fc1): Linear(in_features=768, out_features=3072, bias=True)
tvf7t 2023-04-06T23:02:24.022Z (fc2): Linear(in_features=3072, out_features=768, bias=True)
tvf7t 2023-04-06T23:02:24.022Z )
tvf7t 2023-04-06T23:02:24.022Z (layer_norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
tvf7t 2023-04-06T23:02:24.022Z )
tvf7t 2023-04-06T23:02:24.022Z )
tvf7t 2023-04-06T23:02:24.022Z )
tvf7t 2023-04-06T23:02:24.022Z (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
tvf7t 2023-04-06T23:02:24.022Z )
tvf7t 2023-04-06T23:02:24.022Z )
tvf7t 2023-04-06T23:02:24.023Z Launching training on one GPU.
tvf7t 2023-04-06T23:02:24.025Z Creating dataloader
tvf7t 2023-04-06T23:02:26.667Z 2023-04-07 01:02:26.667 ***** Running training *****
tvf7t 2023-04-06T23:02:26.667Z 2023-04-07 01:02:26.667 Num examples = 400
tvf7t 2023-04-06T23:02:26.667Z 2023-04-07 01:02:26.667 Instantaneous batch size per device = 4
tvf7t 2023-04-06T23:02:26.667Z 2023-04-07 01:02:26.667 Total train batch size (w. parallel, distributed & accumulation) = 4
tvf7t 2023-04-06T23:02:26.667Z 2023-04-07 01:02:26.667 Gradient Accumulation steps = 1
tvf7t 2023-04-06T23:02:26.667Z 2023-04-07 01:02:26.667 Total optimization steps = 2000
tvf7t 2023-04-06T23:02:28.159Z
0%| | 0/2000 [00:00<?, ?it/s]
Steps: 0%| | 0/2000 [00:00<?, ?it/s]/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
tvf7t 2023-04-06T23:02:28.159Z warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
tvf7t 2023-04-06T23:02:34.392Z
Steps: 0%| | 1/2000 [00:02<1:18:52, 2.37s/it]
Steps: 0%| | 1/2000 [00:02<1:18:52, 2.37s/it, loss=0.0737]
Steps: 0%| | 2/2000 [00:03<58:40, 1.76s/it, loss=0.0737]
Steps: 0%| | 2/2000 [00:03<58:40, 1.76s/it, loss=0.148]
Steps: 0%| | 3/2000 [00:05<52:12, 1.57s/it, loss=0.148]
Steps: 0%| | 3/2000 [00:05<52:12, 1.57s/it, loss=0.106]
Steps: 0%| | 4/2000 [00:06<49:07, 1.48s/it, loss=0.106]
Steps: 0%| | 4/2000 [00:06<49:07, 1.48s/it, loss=0.0693]
Steps: 0%| | 5/2000 [00:07<47:30, 1.43s/it, loss=0.0693]2023-04-07 01:02:34.392 Checking Save path sd-concept-output/learned_embeds-step-5.bin
tvf7t 2023-04-06T23:02:34.392Z 2023-04-07 01:02:34.392 Saving embeddings
tvf7t 2023-04-06T23:02:34.392Z GPU memory occupied: 5980 MB.
tvf7t 2023-04-06T23:02:41.150Z
Steps: 0%| | 5/2000 [00:07<47:30, 1.43s/it, loss=0.0995]
Steps: 0%| | 6/2000 [00:09<46:29, 1.40s/it, loss=0.0995]
Steps: 0%| | 6/2000 [00:09<46:29, 1.40s/it, loss=0.17]
Steps: 0%| | 7/2000 [00:10<46:11, 1.39s/it, loss=0.17]
Steps: 0%| | 7/2000 [00:10<46:11, 1.39s/it, loss=0.0928]
Steps: 0%| | 8/2000 [00:11<45:48, 1.38s/it, loss=0.0928]
Steps: 0%| | 8/2000 [00:11<45:48, 1.38s/it, loss=0.153]
Steps: 0%| | 9/2000 [00:13<45:25, 1.37s/it, loss=0.153]
Steps: 0%| | 9/2000 [00:13<45:25, 1.37s/it, loss=0.0219]
Steps: 0%| | 10/2000 [00:14<45:07, 1.36s/it, loss=0.0219]2023-04-07 01:02:41.150 Checking Save path sd-concept-output/learned_embeds-step-10.bin
tvf7t 2023-04-06T23:02:41.150Z 2023-04-07 01:02:41.150 Saving embeddings
tvf7t 2023-04-06T23:02:41.150Z GPU memory occupied: 5980 MB.
tvf7t 2023-04-06T23:02:47.948Z
Steps: 0%| | 10/2000 [00:14<45:07, 1.36s/it, loss=0.0189]
Steps: 1%| | 11/2000 [00:15<45:09, 1.36s/it, loss=0.0189]
Steps: 1%| | 11/2000 [00:15<45:09, 1.36s/it, loss=0.0111]
Steps: 1%| | 12/2000 [00:17<45:08, 1.36s/it, loss=0.0111]
Steps: 1%| | 12/2000 [00:17<45:08, 1.36s/it, loss=0.189]
Steps: 1%| | 13/2000 [00:18<45:02, 1.36s/it, loss=0.189]
Steps: 1%| | 13/2000 [00:18<45:02, 1.36s/it, loss=0.0839]
Steps: 1%| | 14/2000 [00:19<45:02, 1.36s/it, loss=0.0839]
Steps: 1%| | 14/2000 [00:19<45:02, 1.36s/it, loss=0.0343]
Steps: 1%| | 15/2000 [00:21<44:56, 1.36s/it, loss=0.0343]2023-04-07 01:02:47.948 Checking Save path sd-concept-output/learned_embeds-step-15.bin
tvf7t 2023-04-06T23:02:47.948Z 2023-04-07 01:02:47.948 Saving embeddings
tvf7t 2023-04-06T23:02:47.949Z GPU memory occupied: 5980 MB.
tvf7t 2023-04-06T23:02:54.777Z
Steps: 1%| | 15/2000 [00:21<44:56, 1.36s/it, loss=0.13]
Steps: 1%| | 16/2000 [00:22<44:50, 1.36s/it, loss=0.13]
Steps: 1%| | 16/2000 [00:22<44:50, 1.36s/it, loss=0.0751]
Steps: 1%| | 17/2000 [00:24<44:57, 1.36s/it, loss=0.0751]
Steps: 1%| | 17/2000 [00:24<44:57, 1.36s/it, loss=0.107]
Steps: 1%| | 18/2000 [00:25<45:03, 1.36s/it, loss=0.107]
Steps: 1%| | 18/2000 [00:25<45:03, 1.36s/it, loss=0.0659]
Steps: 1%| | 19/2000 [00:26<45:02, 1.36s/it, loss=0.0659]
Steps: 1%| | 19/2000 [00:26<45:02, 1.36s/it, loss=0.22]
Steps: 1%| | 20/2000 [00:28<45:04, 1.37s/it, loss=0.22]2023-04-07 01:02:54.777 Checking Save path sd-concept-output/learned_embeds-step-20.bin
Cutting out some logs to fit within size limit
tvf7t 2023-04-06T23:05:52.912Z 2023-04-07 01:05:52.912 Saving embeddings
tvf7t 2023-04-06T23:05:52.912Z GPU memory occupied: 5980 MB.
tvf7t 2023-04-06T23:06:00.272Z
Steps: 7%|▋ | 145/2000 [03:26<45:31, 1.47s/it, loss=0.13]
Steps: 7%|▋ | 146/2000 [03:27<45:34, 1.47s/it, loss=0.13]
Steps: 7%|▋ | 146/2000 [03:27<45:34, 1.47s/it, loss=0.0306]
Steps: 7%|▋ | 147/2000 [03:29<45:24, 1.47s/it, loss=0.0306]
Steps: 7%|▋ | 147/2000 [03:29<45:24, 1.47s/it, loss=0.0483]
Steps: 7%|▋ | 148/2000 [03:30<45:28, 1.47s/it, loss=0.0483]
Steps: 7%|▋ | 148/2000 [03:30<45:28, 1.47s/it, loss=0.123]
Steps: 7%|▋ | 149/2000 [03:32<45:29, 1.47s/it, loss=0.123]
Steps: 7%|▋ | 149/2000 [03:32<45:29, 1.47s/it, loss=0.136]
Steps: 8%|▊ | 150/2000 [03:33<45:20, 1.47s/it, loss=0.136]2023-04-07 01:06:00.272 Checking Save path sd-concept-output/learned_embeds-step-150.bin
tvf7t 2023-04-06T23:06:00.272Z 2023-04-07 01:06:00.272 Saving embeddings
tvf7t 2023-04-06T23:06:00.272Z GPU memory occupied: 5980 MB.
tvf7t 2023-04-06T23:06:07.642Z
Steps: 8%|▊ | 150/2000 [03:33<45:20, 1.47s/it, loss=0.0693]
Steps: 8%|▊ | 151/2000 [03:35<45:16, 1.47s/it, loss=0.0693]
Steps: 8%|▊ | 151/2000 [03:35<45:16, 1.47s/it, loss=0.116]
Steps: 8%|▊ | 152/2000 [03:36<45:28, 1.48s/it, loss=0.116]
Steps: 8%|▊ | 152/2000 [03:36<45:28, 1.48s/it, loss=0.216]
Steps: 8%|▊ | 153/2000 [03:38<45:24, 1.48s/it, loss=0.216]
Steps: 8%|▊ | 153/2000 [03:38<45:24, 1.48s/it, loss=0.139]
Steps: 8%|▊ | 154/2000 [03:39<45:16, 1.47s/it, loss=0.139]
Steps: 8%|▊ | 154/2000 [03:39<45:16, 1.47s/it, loss=0.0166]
Steps: 8%|▊ | 155/2000 [03:40<45:17, 1.47s/it, loss=0.0166]2023-04-07 01:06:07.642 Checking Save path sd-concept-output/learned_embeds-step-155.bin
tvf7t 2023-04-06T23:06:07.642Z 2023-04-07 01:06:07.642 Saving embeddings
tvf7t 2023-04-06T23:06:07.642Z GPU memory occupied: 5980 MB.
tvf7t 2023-04-06T23:06:08.091Z Stopping...