Triplet (contrastive) loss for sequence embedding

AmitMY · September 24, 2020, 1:51pm

I would like to use transformers to train a MIDI2Vec sequence encoder, such that I could map new MIDI sequences to a high dimensional space.

To do so, I was thinking to have transformers as an encoder (with BERT like structure), but then apply a contrastive triplet loss.

Meaning, for each “batch” datum I have 2 inputs, containing the same data, with a different augmentation.
Then I need to encode all inputs, together and apply a contrastive loss between every datum, its corresponding second datum, and a different example from one of the sets.

How should I go about doing this? what already written model parts should I use?
I was thinking:

I need a different tokenizer, which is deterministic over midi, including pitch, velocity, time, etc.
An embedding class that can embed all these features and concatenate them
Use some stacked transformers architecture
Somehow, apply the loss.

A possible alternative would be to train a masked language model over the music notes and velocity, but it will require multiple heads. One for classifying the pitch, one for regressing the velocity, etc

Topic		Replies	Views
(Auto) Sequence Classification model with triplets / contrastive loss Models	1	725	September 20, 2023
Can an EncoderModel be trained on top of a concatenation of BertModel [CLS] embeddings with additional input data using the transformers library? Intermediate	0	446	December 9, 2022
My model doesn't learn with my triplet loss Intermediate	3	66	April 22, 2025
Transformer vs Sentence-Transformer for text classification Intermediate	0	2197	March 12, 2024
Are there any smart loss functions for a sequence of float vectors? 🤗Transformers	0	148	January 7, 2024

Triplet (contrastive) loss for sequence embedding

Related topics