How to pre-train or finetune LLM with structured dataset, so the LLM can reason the relationships between data objects

brook-l · December 1, 2023, 3:17am

I want to train LLM with structured dataset such as database with multiple tables. Here is an simple example:
t_user:

id	name
1	Jason
2	Eric
3	David

t_book:

id	title
1	Gone with The Wind
2	Brave New World
3	Native Son

t_bookstore:

id	name
1	The Book Nook
2	The Literary Loft
3	Wordsmith Books

t_order:

user_id	bookstore_id	book_id
1	2	1
1	2	2
2	3	3

After training the LLM, it can reason on this relational database. Such as when I ask:
how many books did Jason buy in The Literary Loft bootstore? It can answers: 2 books, Gone with The Wind and Brave New World.

How can I prepare the corpus from the database and train the LLM ?

I have searched for some solutions, such as ask the LLM to transform prompts to sql queries, but that’s not what I want.

panigrah · December 1, 2023, 9:40am

what about What is Table Question Answering? - Hugging Face

brook-l · December 4, 2023, 1:41am

Table Question Answering seems can handle only one table, can’t reason relationships between multiple tables, are there any other suggestions?

panigrah · December 5, 2023, 2:07am

There is at least one model for multi table qa.will this work for your data?

Topic		Replies	Views
Train LLM Model using multiple datasets Beginners	0	775	July 28, 2023
What is the text dataset format for fintune LLM? Beginners	2	2732	June 8, 2023
Help with autotrain/LLM finetuning please Beginners	3	2141	August 11, 2023
Understanding regarding "Question Answering model using open-source LLM" Beginners	0	1022	May 3, 2023
Looking for "How-to" on training with multiple files Beginners	1	23	March 1, 2025

How to pre-train or finetune LLM with structured dataset, so the LLM can reason the relationships between data objects

Related topics