Here is what I’ve learned. If you are going to be involved with the A.I space, it is best to learn the application of math. I have a good understanding of the math and I’m glad I did it that way because there is an expectation that you understand it. Consider that the people who create these models are highly educated and specialist in that field. If you read the research papers you’ll find that most are written by folks with Phd or Phd candidates. Since this is the case, outside of paid for models, you better expect to have to solve problems that result from you using opensource. The good thing is there is opportunity for you to solve problems from using these models and maybe there is a payday for you.
Yet again, there is this binary approach of PhD or incompetent. A car was at least in part designed by Phds in a field, yet most people can use it. The problem is not in general how to use it, as huggingface and langchain make it understandable, but how to make it do what you want it to do.
Let’s say, you use a model for text generation and it deviates from your ground truth. For instance, you supplied the context of “Paris is in France”, you ask “Where is Paris” and the model replies with anything but “Paris is in France”. Do you use another model? Do you finetune? Using what, to what? How do you find suitable parameters?
Or, let’s say, you create a fictional world where Paris is not in France, which you supplied in your context, but the model in its original training is adamant that Paris is in France. How much fine-tuning do you need to make it relearn this very fact?
Just like you don’t need a PhD to drive a car, you don’t need a PhD to debug on that level. It’s just not laid out as in troubleshooting. Besides re the tutorials. They often use deprecated functions or libraries, so you have to debug tutorials before you can use them. That’s like learning a new language but having to correct spelling errors in the new language in the book before you can learn the language. There must be a better way.
These can be quite deep questions that are hard to answer even for the most experts. There are things where you can have educated guess knowing the math and properties of the layers, but then things like “how much fine-tuning is needed” is something that you usually learn from experience, and also here, the most you can do is having an educated guess rather then be certain.
I think the analogy of driving a car is that you just drive a car as you perform inference with a model with an already inference script and provided environment, if you want to customize your model you need to be a bit of an expert on that system as you would need to be a mechanic to fix or customize a car.
Anyway to understand the basic properties of these LLM you don’t really need very high math skill, usually the basic concept of conditional probabilities, linear projections and matrix multiplication “is all you need”. I strongly advise everyone to understand a bit of the math behind, it is always useful.
Last thing I would say to you is that right now the A.I. fields is a bit of a jungle even for AI/Data Scientist like me, there are way too many models in both NLP and CV that are arising that is very challenging to keep up, so don’t be discouraged… I think/hope that when/if things will slow down a bit there will be a bit of time to properly understand and better document all of these models weakness/strength and properties
Debugging these models can be frustrating at times, for all the reasons you’ve mentioned. I find dealing with hallucinations in NLP models very challenging. As the other respondents have mentioned, the machine learning / AI field is moving quickly and it can be hard to keep up.
That being said, more and more resources are coming online to help work through these issues. I would recommend you check out ChatGPT Prompt Engineering for Developers if you want to better understand debugging NLP models.