Is there any more tokenizer-free language model available?

xuyan · March 12, 2022, 3:33am

I am looking for tokenizer-free language model here. However, I can only find the results about ByT5, which is pretrained with a variant of mlm. Is there any model trained with a casual language model directly based on UTF-8 encoding bytes?

Reference work: ByT5: Towards a token-free future with pre-trained byte-to-byte models

Topic		Replies	Views
Pretrain and Fine Tune Byte-level model for multilingual extractive QA (Like ByT5) Flax/JAX Projects	13	1985	July 2, 2021
ByT5 and LongT5 Models	2	478	March 2, 2024
Pretraining T5 from scratch using MLM Models	1	394	December 6, 2024
Customized tokenizers Beginners	0	250	August 18, 2022
Search models by tokenizer 🤗Transformers	0	95	April 10, 2024

Is there any more tokenizer-free language model available?

Related topics