I am completely new to HuggingFace but I have truly love the fastai course/library. One of the ideas I liked the most was the ULMFiT approach, in which they fine-tuned a language model trained on wikipedia data on IMDb text and then use it as a classifier:
In my case, I have a long text file (text.txt) related to a specific disease. My end goal is to get the embeddings of disease-specific keywords which are a list in a .csv to visualize them after e.g pca.
I found a promising model in the modelhub intended for “fill-mask”. I would like to take this model, access its language model and fine-tune it with my text. Could somebody point me out to a good tutorial? Here it is done from scratch. I would like to see an example that takes a pre-trained model like mine and some sort of text.txt
At the beginning, I confused this goal with fine-tuning a pre-trained model but now it is clear to me my objective is another one.