How to prepare dataset using patent pdf?

I want to fine tuning LLM on several thousand of patent pdf from a specific domain, can any one tell me how should I pdf data to structure dataset which also contain tables.

2nd question:- is there any other way to train llm instead of data from pdf?

1 Like