Al-Toolkit Guidance required: Training LORA for realistic full-body fashion model portraits

Long post so kindly bear me and help a newbie like me. I will be really thankful to you all. Experts, please don’t ignore. This was my first week on trying to train LoRA for Flux-Schnell using AI-Toolkit on github repo via google colab.

Use Case: Train a LoRA to generate full body portraits of fashion models of varying faces(no consistent face), varying cloth style and size, and body size(depending on the prompt while inference) and the generated image should be of full-body(head to feet).

I had created the following config file in the main ai-toolkit repo:
job: extension

config:

name: my_flux_lora_v2

process:

- type: sd_trainer

training_folder: /content/drive/MyDrive/Shared/ai-toolkit/output

performance_log_every: 200

device: cuda:0

network:

type: lora

linear: 128

linear_alpha: 128

save:

dtype: float16

save_every: 500

max_step_saves_to_keep: 4

push_to_hub: true

hf_repo_id: username/flux_lora_model

hf_private: true

datasets:

- folder_path: /content/drive/MyDrive/Shared/dataset

caption_ext: txt

caption_dropout_rate: 0.1

shuffle_tokens: false

cache_latents_to_disk: true

resolution: 512

batch_size: 2

steps: 1000

gradient_accumulation_steps: 8

train_unet: true

train_text_encoder: false

gradient_checkpointing: true

noise_scheduler: flowmatch

optimizer: adamw8bit

lr: 0.0001

ema_config:

use_ema: true

ema_decay: 0.99

dtype: bf16

model:

name_or_path: black-forest-labs/FLUX.1-schnell

assistant_lora_path: ostris/FLUX.1-schnell-training-adapter

is_flux: true

quantize: true

sample:

sampler: flowmatch

sample_every: 100

width: 512

height: 512

prompts:

- placed three prompts here.

neg: ‘’"

seed: 20

walk_seed: true

guidance_scale: 7.5

sample_steps: 25

meta:

name: my_first_flux_lora_v2

version: ‘1.1’

The requirement of my supervisor is to create such a LoRA that can generate said image of full body in the very first try. In my validations, I was getting some close-up shots as well.

For the reference of dataset, I would like to state that I had chosen 20 images of different fashion models looking straight with full body(10 male and 10 female). The training images were of size: 512x512.

So based on my use case and details, kindly tell me how to prepare the dataset and setup the configurations in such a way that the trained LoRA can be used for the specific goal without any error.
Also, I want to know that I had set quantize to true as can be seen above but the trained LoRA was utilizing 40 GB VRAM while generating images, how to make it utilize less resources yet keeping up the speed and quality of generated images.

And further discussion part: I am tasked to take it further to create Flux-schnell LoRAs for virtual try-on to create a LoRA that can effectively swap clothes, etc, and a LoRA to change poses of the fashion model with consistent features of the original fashion model portrait. So what material, tutorials, guides can I look upto for this and any helpful guidance for this case from you would be helpful as well.

Thank you for bearing me, looking forward to your guidance.

1 Like