Hi, I’m a beginner in Transformers huggingface.
I wonder How to i change the text embedding models in LayoutLMv2(original to KoBERT)
I knew there’s a processor for LayoutLMv2, LayoutXLM.
So, I think i need to change the text tokenizer for data loading, and change the text encoding weights (in Original LayoutLMv2 model) as KoBERT’s like below codes.
kobert_name = "monologg/kobert"
bert_model = BertModel.from_pretrained(kobert_name)
kobert_tokenizer = KoBertTokenizer.from_pretrained(kobert_name)
feature_extractor = LayoutLMv2FeatureExtractor(apply_ocr=False)
processor = LayoutLMv2Processor(feature_extractor, kobert_tokenizer) => Returned below error
# ValueError: Received a KoBertTokenizer for argument tokenizer, but a ('LayoutLMv2Tokenizer', 'LayoutLMv2TokenizerFast') was expected.
model = LayoutLMv2ForTokenClassification.from_pretrained("microsoft/layoutxlm-base", num_labels=len(labels))
# Need to exchange the layoutlmv2.embeddings. as kobert parameters(weights)
But I got error in processor defined part… I think the original LayoutLMv2Processor only define the originals
I’m using this code
What point should i modify for changing text embedding?
Please share any tips for beginner.