How can I offline fine-tune for local files

Hey, complete newbie here to all of this.
I want to download models for complete offline use with local files(text, images, pdfs, videos), but I have no idea where to start from.

I’m not mistaken and this is actually possible, right?

As far as I’ve got it, each model is trained for specific task, so I’ll need to use model for text, images, pdfs and videos separately, right?

Also, I didn’t really understand what the model is for getting text from videos. For example I have a lecture video, or a youtube video, are there models for extracting data/text from them so I could use it(chat with AI about them)?

I have a GPU similar to 1030 mobile, how bad is it to fine-tune a model(in terms of hours/days it will be used)?

It is indeed possible to download models for complete offline use. You can download them while you have Internet access and after that you don’t need connection to the Internet anymore.

There are multimodal models and single mode models.

The time required by fine-tuning a model can vary a lot depending on the size of the training set, the model and the resources available to fine-tune.

Thanks for helping out.

How exactly can I fine tune my local files, is there a step by step process for this please?
For example, if I have “Documents” folder with my text files in it and I want to chat based on that folder, how do I do that?

Also, what about videos, what model can I use to scan them and extract data/text to chat about them further?

Check out retrieval augmented generation for using local files and a chat bot like in the following link. https://huggingface.co/docs/transformers/main/en/chat_templating#advanced-retrieval-augmented-generation

I didn’t find a category of video-to-text models at Hugging Face at the following URL. https://huggingface.co/models
You can try full-text search at Hugging Face to find information about them.
https://huggingface.co/search/full-text?q=video+to+text