🤗Tokenizers

Topic	Replies	Views	Activity
Qwen tokenizer, omit Ġ from offset_mapping	3	24	July 18, 2025
Word level tokenizer pulls special tokens out of pretokenized strings	3	21	July 4, 2025
Adding atomic / indivisible tokens to BPE tokenizer	8	34	July 3, 2025
Tokenizer is splitting special token	3	20	June 30, 2025
How to determine if a token is special	2	47	April 29, 2025
Return_offsets_mapping when decoding	3	45	April 25, 2025
Saving tokens by system prompt	1	54	April 22, 2025
Introducing FlashTokenizer: The World's Fastest Tokenizer Library for LLM Inference	2	37	March 21, 2025
Why does tokenization take so long?	1	433	March 19, 2025
Call rust function in python	1	24	March 19, 2025
Tokenizer taking extremely long time to train	1	976	March 19, 2025
Rs-bpe tokenizer [PyPI \| Python] - Outperforms tiktoken & tokenizers	2	48	March 19, 2025
Build failure with NoGIL Python on main	0	23	March 17, 2025
SentencePiece to Tokenizers conversion	0	100	March 14, 2025
Decode token IDs into a list (not a single string)	4	4251	March 11, 2025
Adapter-aware chat_template	3	155	February 21, 2025
Fine-tuning whsiper on custom special tokens	0	101	February 16, 2025
Inconsistent, mojibaked tokenization in some but not all Huggingface tokenizers	1	19	February 12, 2025
Errors with Tokenizers on Llama	2	261	February 8, 2025
Character level tokenizer with specific order	5	68	February 7, 2025
Custom Tokenizer Error - Please Help!	0	28	February 7, 2025
ValueError: You should supply an encoding or a list of encodings to this method that includes input_ids, but you provided ['pixel_values']	0	103	February 4, 2025
BPEtokenizer reports error "not valid UTF-8" when processing txt file	7	86	January 29, 2025
NLLB tokenizer multiple target/source languages within a training batch	5	1499	January 10, 2025
Creating a Custom Token Vocabulary for GPT-2	1	368	January 7, 2025
'NoneType' Object Error and Token Authorization Issue in OpenAI API Integration	4	145	January 5, 2025
Train_from_iterator throwing TypeError: expected string or buffer errir	2	24	January 3, 2025
What should be processing_class param value of Seq2SeqTrainer for VisionEncoderDecoderModel Finetuning?	0	24	January 3, 2025
Error creating custom pre_tokenizer	3	44	January 2, 2025
Adding special tokens to LEDTokenizer	0	42	December 25, 2024