Hi, I’m working on a deep learning project for music recommendation. The idea is either build a CNN or load a pretrained model and train/fine-tune with GTZAN dataset in order to learn how to identify music patterns from songs raw mp3. Once the model is succesfully trained/tuned, I would try to obtain latent vectors from last layers for each songs, and then compute euclidean/cosine distance to compute similarity for recommendation.
So far, I’ve fine-tuned ntu-spml/distilhubert model using transformers library and finally obtained 80% accuracy. However, I don’t know how to extract latent vectors from fine-tuned model. If anyone would be about to help me I will provide code cells.