Even if it’s a label, if it’s something simple like cats or dogs, it can be classified using simple image classification models.
If you can do it manually, that’s fine too.
For detailed captioning, such as for training image generation AI models, you could use models like the ones used in spaces like the one below.