I want to fine tuning LLM on several thousand of patent pdf from a specific domain, can any one tell me how should I pdf data to structure dataset which also contain tables.
2nd question:- is there any other way to train llm instead of data from pdf?