Question answer model for Process Data in IIOT

I am using python and Lets say, df as follows
Category RAM
0 LAMAG12 60
1 KAMAG32 40
2 JAMAG89 50
3 QWMAG90 30
4 PAMAG54 90

I will ask que like

  1. what is sum of LAMAG12 and PAMAG54 RAM ?
    so it should give ans: 150
  2. List out all categories whose RAM is greater than 50 ?

SO LIKE THIS I CAN ASK ANY QUESTIONS

How to create such model ?
Which transformer model is best here ?
is tokenization imp here ?
How to train this df to ask questions ?

1 Like

You don’t need a large transformer to start.
This task is well suited to a small fine-tuned model on structured question/answer pairs from your data.
Simplest Working Strategy:

Convert the DataFrame into a flat JSON/text format

[
{“Category”: “LAMAG12”, “RAM”: 60},
{“Category”: “KAMAG32”, “RAM”: 40},
…
]

Train a small language model like phi-1.5 or TinyLlama using Q&A pairs:

Q: What is the sum of LAMAG12 and PAMAG54 RAM?
A: 150

Q: List all categories with RAM greater than 50.
A: LAMAG12, PAMAG54

Use sentencepiece or BPE tokenization (yes, it matters clean tokens = better understanding)

Optionally use LoRA for fast adapter training on top of a pretrained model

Model Suggestion:

Phi-1.5 for local or offline

mistral-7b-instruct if you need deeper logic

Avoid GPT-2/3 if you're aiming for symbolic/deterministic accuracy

Powered by Triskel Data Deterministic AI
Your answers don’t need to guess just reflect.

1 Like

If you are training an LLM from scratch, these may also be useful references.

@Pimpcat-AU Thank you for your answer
I tried but failed to train and save model

Can you please provide Python code snippet with phi-1.5 ?

1 Like