I want a critical production RISK analysis problem. So, based on a record I want to risk rank each record from 0 to 5. The training set is fairly imbalanced.
> "0.0 964
> 1.0 393
> 2.0 396
> 3.0 286
> 4.0 109
> 5.0 44"
Now, this is what the current training set look like:
2 Risk Rank float64
3 a_weights int64
4 b_weights float64
5 c_weights float64
6 d_weights float64
7 e_weights float64
8 f_weights float64
9 g_weights float64
10 FinalDesc object
Where the FinalDesc column contains a string(description of the Work Order).
For example:
“HVAC REPALCEMENT TOOLS EULDUE TO HARSH ENVIROMENT. Please fix with caution”
I also have weights of KEY words in the Final Desc that will help ranking.
But, the problem right now is, my supervisor gave me plant specific context that might help with the predictions. For example:
"
Records for firewatch are considered lower risk,
Valve 4/5 on Autoclave or generally lower risk due to higher stocking levels.
REL records to review PM details do not present immediate risks.
"
There are more context. What is the best way to do these rankings? Should I leverage the power of LLM’s? Please let me know the best way to incorporate context.
My current approach was:
- vectorize the description and add to dataframe
- Use a random Forrest classifier to rank the work orders(train, predict). With both nuemerical and the description
It gets an accuracy of 66%. I want to add more complex AI/ML features to solve this problem