Diffusion model without any other data except what I train myself?

I am trying to learn all the basics of the models and how they behave. For this I would like to create my own “base model” what would be useful for me just for the learning. Shortly I would like to create a model what works with Stable Diffusion and what is trained with very small amount of images of same subject I feed to the model.

I followed this tutorial Train a diffusion model here and I was able to generate small model on my local machine (with RTX 2060, 6 GB RAM, used 5 image patch). After converting it to safetensors I was able to load it to Stable Diffusion so it went fine on that part.

Still, when I wrote “monkey” or " duck" I still got pictures of Duck and Monkey, so this model was not “empty” in that sense that it still got some other items from some other repository.

So the question is, how I am able to train a model what have no ANY kind of knowledge about any other images except about those what I have explicitly selected as a training data?

I know that it makes no sense in real life use and I understand that it have no any realistic usage at all in real life, but this is just for me so I can understand what will happen and how Stable Diffusion behave if I have a model what was created using only 1 image, only 2 images, only 3 images and so on. You know, personal interest to get hands on approach how things work when I can use empty models where is no already data fed to it.

So if I want to train with 3 pictures of a cat and I want to use that model in Stable Diffusion and I want to be sure that there is no any other images where it can make its pictures, where should I start since that model training tutorial still gave me lots of other images I didn’t wanted for this testing purposes?