Refine BERT to pay more attention to key words

ysys001 · November 24, 2023, 6:23pm

Hi all,

I’m trying to use BERT (or any language embedding models) to solve a semantic text similarity problem: given a product A, find product B, which is basically the same underlying product, with a few key differences. For example, “ABC green T-shirt” matches with “ABC green T-shirt (2-count)”; however, “ABC green T-shirt” does NOT match with “ABC red T-shirt (2-count)”. So my goal is to refine BERT to pay more attention to, in this particular case, color, while not losing sight of the more important information, T-shirt.

What I’m doing now is follow Natural Language Inference — Sentence-Transformers documentation

train BERT with correct pairs, such as “ABC green T-shirt” matches with “ABC green T-shirt (2-count)”. There are about 150k training instances.
train 1) model with triplets, such as (“ABC green T-shirt” , “ABC green T-shirt (2-count)”, “ABC red T-shirt (2-count)”. There are about 5k training instances, so overfitting can be a problem here.

After these steps, I saw a slight improvements on matching accuracy. So my question is:

am I on the right direction?
what’re the more up-to-date fine-tuning methods compared to the ones in sentence transformer documentation?

Thank you very much!

Topic		Replies	Views
Identifying and getting right embeddings from the fine tuned BERT on domain specific data Intermediate	0	1331	September 8, 2021
Search using raw word embedding similarity from BERT Beginners	0	830	October 16, 2021
Can I fine tune bert for a project where I have multiple text inputs and one label as output? Beginners	0	799	May 6, 2022
Fine tuning a sentence-transformer for cosine sim on 500k sentence pairs without labels-- advice 🤗Transformers	2	1205	April 20, 2024
Fine tuning an unsupervised model - BERT Beginners	0	859	April 7, 2022

Refine BERT to pay more attention to key words

Related topics