Creating word embeddings using BERT of machine generated sequential data

bunny1278 · April 7, 2023, 8:36pm

I have a dataset of machine-generated sequences that are not natural language, but the order of the elements in the sequence is important. I want to create word embeddings using BERT to capture the sequential relationships between these elements. Can anyone provide guidance on how to preprocess and format the data for input into BERT, and how to fine-tune the model to generate useful embeddings for this type of data?

Note: The vocabulary in my data is not present in pre trained bert model, Can anyone guide me how to achieve my goal?

example of my vocabulary(list of sentences) = [‘ixeg6164 ox78dsf12 lx3cd875’, ‘duish7 oiu587 kj854j 987hdk’ …]

Topic		Replies	Views
Training BERT model from scratch with custom sequence Beginners	0	394	September 21, 2022
How to create word embeddings for non-English languages using BERT-like models? Beginners	0	605	March 22, 2021
Training BERT for word embedding Beginners	17	14472	November 12, 2022
What is the best way to create a unique representation of a word from BERT embeddings? Beginners	1	437	June 14, 2022
Generate raw word embeddings using transformer models like BERT for downstream process Beginners	9	39954	October 4, 2021

Creating word embeddings using BERT of machine generated sequential data

Related topics