I’m new here, looking for guidance and reality check before I go out and buy hardware.
I have about 17000 pictures, mostly of my children’s childhood and the travels we’ve made, neatly tagged (using Digikam) with a taxonomy. I don’t want to share this beyond my firewall, at least not now, but it would be interesting to be able to use them to generate pictures of ourselves in contexts of a wider information space of pictures. Or, like the first use case, generate a xmas postcard.
I’ve been looking, and it seems DreamBooth with LoRA is what comes closest, but my impression of DreamBooth was that it was for finetuning with a small number of pictures, not the 17000 pictures I have. My thinking was to train with those on the top of an existing model and then have the resulting model lying around on my disk for later. Does it sound like DreamBooth is what I should be using?
Since I have the tags, supervised learning sounds like the right thing to do. If I understand tokenization correctly, it would break down natural language strings, but I haven’t got that. Is there something out there that would do tokenization based on my taxonomy and other metadata, such as timestamps and subjective rating? Or should I write some code to generate strings from this? With my domain knowledge, I could generate strings like “
kid_name, 5 years old, on Grand Canaria which is in the Canary islands which is in Spain, swimming, with a good quality rating”. What’s the best approach to do this?
Then, I suppose I wouldn’t train on full resolution pictures? Should I first scale and crop them down to 512x512 px or something?
The hardware I have is a bit on the thin side… I have a 6-core Intel Core i7-8700 CPU with UHD Graphics 630 and 32GB of RAM. I have 1.7TB of spinning disks available and 435GB NVMe. I have found that what I can currently afford is a graphics card with 10 GB VRAM (in addition to the on-chip Intel GPU). But it wouldn’t be a problem if a process was running for weeks, apart from that I wouldn’t get my xmas card So, if that’s simply not enough, the project would end right here, but it sounds like it is doable with a LoRA approach?