How to train FLUX.1 for custom emoji generation — dataset size, script, and deployment?

I’m working on a personal project where I want to generate custom emoji-style images from text prompts — like turning this:

Flying pig:pig: with wings

I’m using black-forest-labs/FLUX.1-dev as the base model. It’s a diffusion model similar to Stable Diffusion, but optimized for low-VRAM generation.


What I have:

  • ~25k 512x512 emoji-style images
  • Captions for each (in .txt files)
  • A train.json mapping image to caption
dataset/
├── images/image_001.png,...
├── captions/caption_001.txt,...
└── train.json  # [{ "image": "images/image_001.png", "caption": "captions/caption_001.txt" }, ...]

What I need help with:

  1. How many images is “enough”? Is 25k too much or just fine?
  2. Any working training script for FLUX.1?
    • I tried one (PyTorch + diffusers), but outputs look like noise.
  3. Best training config?
    • Should I freeze VAE/text encoder?
    • Recommended batch size, LR, etc?
  4. How do I export the model to ONNX or TFLite?

Planning to use it in a Flutter app later.
A sample setup, script or any advice would be helpful for beginners to get started.

1 Like

1

There are enough. Basically, always, more is better.

2

These scripts are well known. If you search for them, you will find a huge amount of know-how, so I recommend that you search for know-how first.

3

The model you are trying to create this time is a little more specialized than, for example, imitating someone’s face, so it might be better to look for a similar use case (like recreating a painting style or a similar deformation?) and use that as a reference for the parameters.

4

I’ve never seen anyone export FLUX, but if it’s possible, I think this is how you do it…

1 Like

Can be optimized and trained through proxy ip