Tokenizer for German lang

whatisslove11 · June 22, 2023, 7:34pm

Hi!
I try to implement neural machine translition model from scratch and now I choose tokenizer for languages.
I read what tokenizer has special Unicode Normalizer (NFC, NFD, etc.). I have a few questions for this normalizers

Do I need to use Unicode normalizers for German or will any other one be suitable?
Is there any additional information on unicode normalizers?

Thanks a lot for your help!

Topic	Replies	Views
How to add special tokens to a pretrained model? Beginners	387	June 18, 2021
Employing Different Tokenizers in a Translation Model Models	216	July 27, 2023
How do tokenizer(text_target=text) work 🤗Transformers	446	December 24, 2022
Customized tokenizers Beginners	250	August 18, 2022
How to build a multilingual tokenizer from scratch (for mbart)) Beginners	202	October 27, 2021

Tokenizer for German lang

Related topics