Click data and object detection

Hello all

Are there any recommended models or solution that are able to connect click data to object detection?

For example, I have a GUI and I want to know how users are interacting with it.

Would I need to generate a custom dataset made from the GUI and then fine tune something like DETR? Or is there a way to use the click data to automatically generate bounding boxes?

Thanks in advance