Hi, I’m a beginner in Transformers huggingface.
I wonder How to i change the text embedding models in LayoutLMv2(original to KoBERT)
I knew there’s a processor for LayoutLMv2, LayoutXLM.
So, I think i need to change the text tokenizer for data loading, and change the text encoding weights (in Original LayoutLMv2 model) as KoBERT’s like below codes.
kobert_name = "monologg/kobert" bert_model = BertModel.from_pretrained(kobert_name) kobert_tokenizer = KoBertTokenizer.from_pretrained(kobert_name) feature_extractor = LayoutLMv2FeatureExtractor(apply_ocr=False) processor = LayoutLMv2Processor(feature_extractor, kobert_tokenizer) => Returned below error # ValueError: Received a KoBertTokenizer for argument tokenizer, but a ('LayoutLMv2Tokenizer', 'LayoutLMv2TokenizerFast') was expected. model = LayoutLMv2ForTokenClassification.from_pretrained("microsoft/layoutxlm-base", num_labels=len(labels)) # Need to exchange the layoutlmv2.embeddings. as kobert parameters(weights)
But I got error in processor defined part… I think the original LayoutLMv2Processor only define the originals
I’m using this code
What point should i modify for changing text embedding?
Please share any tips for beginner.