I want to train LLM with structured dataset such as database with multiple tables. Here is an simple example:
t_user:
id | name |
---|---|
1 | Jason |
2 | Eric |
3 | David |
t_book:
id | title |
---|---|
1 | Gone with The Wind |
2 | Brave New World |
3 | Native Son |
t_bookstore:
id | name |
---|---|
1 | The Book Nook |
2 | The Literary Loft |
3 | Wordsmith Books |
t_order:
user_id | bookstore_id | book_id |
---|---|---|
1 | 2 | 1 |
1 | 2 | 2 |
2 | 3 | 3 |
After training the LLM, it can reason on this relational database. Such as when I ask:
how many books did Jason buy in The Literary Loft bootstore? It can answers: 2 books, Gone with The Wind and Brave New World.
How can I prepare the corpus from the database and train the LLM ?
I have searched for some solutions, such as ask the LLM to transform prompts to sql queries, but that’s not what I want.