I think extracting the text and translating it from a PDF should be pretty easy using openCV and some translator LLM. However, I am not sure how to put the translated text back in the right place. Here’s an example of what I want to achieve:
to
I believe Gradio did something very similar today where they can translate a research paper from English to Chinese without altering the style (here). I know I can probably just use their API, but I am interested in learning how I could do it myself (out of curiosity).
EDIT:
The images are stored in a PDF