What is an efficient method to manually create image descriptions?

rmbmail · October 22, 2024, 7:52pm

I want to add descriptions to a few thousand images and I’m looking for an efficient way to do this. Ideally I’d like something on Android where I see the image, I can speak the description, it gets transcribed to text and stored in some way with the image. Then I click next/OK, see the next image and repeat.

Has anyone done something similar or have an idea of how they would do it?

John6666 · October 23, 2024, 12:13am

The process of adding descriptions to a large number of images is usually done semi-automatically by a tool or VLM like the following, for example, but it is a rare use case when it is only done manually…
I think it is possible to achieve your flow using an ASR model such as Whisper, but I have not seen such a finished product in Spaces, so I think the only way is to create one. If you want to find or create something similar, I can provide you with information.

rmbmail · October 23, 2024, 3:43pm

Thanks for the input, John. If I end up building something it seems like Whisper would be the best option for the ASR portion.

John6666 · October 23, 2024, 4:15pm

If you are going to use Whisper, the following one seems to be fast and good, although it requires a GPU.
The flow of the program that I personally thought of is to put 1000 image files in a private dataset repo in HF, display one of them in the GUI, accept voice input in Whisper and put it in a text box, and improve the contents of the text box by combining an appropriate grammar checker, When the Submit button is pressed, a .txt file is saved in the dataset repo with the same name as the image file, only with a different extension. and the following image is displayed. Images for which .txt is found are not displayed because they have already been processed.
I think you can make something like this using only common existing functions.
It would be nice to put an appropriate VLM or tagger in front of Whisper to aid input.

system · March 4, 2025, 6:41pm

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
I want to be able to upload an image and have a story idea generated, based on my criteria Beginners	2	260	September 12, 2024
Can anybody recommend a good image filename generating AI? Intermediate	1	27	April 24, 2025
How to create a dataset for "audio-like" files for ASR Beginners	0	402	April 10, 2023
How does one actually create a new dataset? Beginners	2	3248	October 18, 2024
Inference provider for captioning (image2text model) Beginners	3	22	June 16, 2025

What is an efficient method to manually create image descriptions?

Related topics