Hello guys,
I’m new to the world of AI and I’m encountering some challenges while fine-tuning the OpenAI Ada model.
Firstly, I run a large online store with over 600K products spread across more than 2K categories. Each product is associated with two distinct categories: a_category and b_category. My goal is to train the model to generate these categories for products that haven’t been categorized yet, using their title and description.
I’ve formatted the data in JSONL, with each line structured as follows:
{“prompt”: {“title”: “titleValue”, “description”: “descriptionValue”}, “completion”: {“aCategory”: “aCategoryValue”, “b_category”: “bCategoryValue”}}
I’ve prepared around 100K product lines in this manner. After initiating the fine-tuning process and subsequently testing with a separate dataset (not used for training), I observed that approximately 77% of the categories were generated perfectly. However, about 12% of the products had only one of the two categories correctly generated.
Interestingly, when I attempt to fine-tune the already fine-tuned model using the failed test cases and then test again with new data, the results deteriorate: only 44% are a full match and 32% are a partial match.
What could be causing this decline in performance? And how can I train the model to achieve 100% accuracy in category generation?
Thanks in advance!
As I understand correctly, you need to estimate two classes at the end. I think fine-tuning a language model for this task is not necessary. You can extract word vectors from ADA (I would prefer larger models here, prices would be similar I assume) and design a small model for this specific task.
The reason of your performance drop is probably due to overfitting. It occurs when the model is too large (capable), i.e. has many parameters and data is not sufficiently large in parallel, such that model memorizes patterns in the training set rather then learning them and can not generalize to the test set.
1 Like
Can you give me a more info about word vectors extration, how to do that and what are the next steps? 
Basically, word vectors (or word embeddings) are numerical representations of word in the vocabulary and they contains also contextual information. Typically, an AI model consist of a feature extraction and classification parts. In your case, you can use word embeddings as feature inputs to your classification model (instead of finetuning whole LM).
For OpenAI word embeddings, you can get more information here: OpenAI Platform
But of course, you have also other options, like fastText: English word vectors · fastText
After feature extraction, you can use custom models for classification. I would search for models already used for similar problems and adapt to my case. GitHub is a good source for this. Unfortunately, there is not single bullet-proof way of solving this. One should try possibilities to find best solution.