Hello, I am pretty new to fine-tuning and even NLP as a whole. I havenāt been working with the Transformers library for very long so thereās much that I donāt know or fully understand.
Iāve been trying to make sense of how the Transformers Trainer makes use of ālabelsā, āinput_idsā, and āattention_masksā during the fine-tuning process.
Here is how I understood the overall process:
Supervised Fine-tuning:
- to perform SFT with the Trainer class, must explicitly provide ālabelsā as part of your train/eval dataset
- when ālabelsā are not provided, the Trainer class still functions fine but the process would then be classified as Unsupervised Fine-tuning
- the Trainer has default metrics that is computes (perplexity�) for evaluation and you can explicitly provide other metrics to compute with the compute_metrics argument of the Trainer class
- fine-tuning is performed by somehow using the labels and input_ids, with a certain loss function (cross_entropy loss�)
Here are some questions that Iām struggling with:
-
In many of the fine-tuning tutorials Iāve seen, the authors use the Trainer class to train a model initialized with AutoModelForCausalLM. Does the āCausalLMā part indicate that the model has been initialized specifically for next token predictionā¦? If so, does the way that the model is initialized (ā¦ForCausalLM, ā¦ForSeq2Seq, etc.) have an effect on the way the Trainer performs fine-tuning? Similarly, do the different task types entail different formatting techniques when it comes to the data itself?
-
Some of the demo code Iāve seen implements supervised fine-tuning by using the mask token id (-100) to mask everything except for what the model would ideally output (nox project). I wanted to know how exactly the mask token is processed by the language models during fine-tuning, but canāt seem to find any decent sites/explanations.
Iāve been reading the HuggingFace documentation pages but canāt seem to find enough information when it comes to the specifics. If there are any sources/guides about any of the topics, I would really appreciate them.