How to prepare dataset using patent pdf?

yash3056 · January 29, 2025, 7:03pm

I want to fine tuning LLM on several thousand of patent pdf from a specific domain, can any one tell me how should I pdf data to structure dataset which also contain tables.

2nd question:- is there any other way to train llm instead of data from pdf?

Topic		Replies	Views
Generate dataset for fine tuning on PDF(s) 🤗Transformers	6	3267	September 3, 2024
Preparing datasets for NLP tasks 🤗Datasets	1	543	July 28, 2021
Creating Own model for custom data Beginners	1	269	November 5, 2024
How to train a model to extract specific data from PDFs? Beginners	2	2794	January 30, 2025
Fine Tuning LLM Research	0	1714	August 16, 2023

How to prepare dataset using patent pdf?

Related topics