|
Return_offsets_mapping when decoding
|
|
4
|
97
|
November 20, 2025
|
|
Training a Vision Language Model on Text Only Data
|
|
2
|
36
|
October 11, 2025
|
|
Advice for Singing Synthesis and Tokenization
|
|
2
|
17
|
October 8, 2025
|
|
Intelligent Tokenizer: Attention Needs No Vocabulary (Demo + Paper)
|
|
0
|
44
|
September 14, 2025
|
|
Qwen tokenizer, omit Ä from offset_mapping
|
|
3
|
67
|
July 18, 2025
|
|
Word level tokenizer pulls special tokens out of pretokenized strings
|
|
3
|
24
|
July 4, 2025
|
|
Adding atomic / indivisible tokens to BPE tokenizer
|
|
8
|
83
|
July 3, 2025
|
|
Tokenizer is splitting special token
|
|
3
|
33
|
June 30, 2025
|
|
How to determine if a token is special
|
|
2
|
98
|
April 29, 2025
|
|
Saving tokens by system prompt
|
|
1
|
90
|
April 22, 2025
|
|
Introducing FlashTokenizer: The World's Fastest Tokenizer Library for LLM Inference
|
|
2
|
40
|
March 21, 2025
|
|
Why does tokenization take so long?
|
|
1
|
525
|
March 19, 2025
|
|
Call rust function in python
|
|
1
|
33
|
March 19, 2025
|
|
Tokenizer taking extremely long time to train
|
|
1
|
994
|
March 19, 2025
|
|
Rs-bpe tokenizer [PyPI | Python] - Outperforms tiktoken & tokenizers
|
|
2
|
100
|
March 19, 2025
|
|
Build failure with NoGIL Python on main
|
|
0
|
27
|
March 17, 2025
|
|
SentencePiece to Tokenizers conversion
|
|
0
|
172
|
March 14, 2025
|
|
Decode token IDs into a list (not a single string)
|
|
4
|
4675
|
March 11, 2025
|
|
Adapter-aware chat_template
|
|
3
|
160
|
February 21, 2025
|
|
Fine-tuning whsiper on custom special tokens
|
|
0
|
152
|
February 16, 2025
|
|
Inconsistent, mojibaked tokenization in some but not all Huggingface tokenizers
|
|
1
|
28
|
February 12, 2025
|
|
Errors with Tokenizers on Llama
|
|
2
|
316
|
February 8, 2025
|
|
Character level tokenizer with specific order
|
|
5
|
96
|
February 7, 2025
|
|
Custom Tokenizer Error - Please Help!
|
|
0
|
28
|
February 7, 2025
|
|
ValueError: You should supply an encoding or a list of encodings to this method that includes input_ids, but you provided ['pixel_values']
|
|
0
|
130
|
February 4, 2025
|
|
BPEtokenizer reports error "not valid UTF-8" when processing txt file
|
|
7
|
124
|
January 29, 2025
|
|
NLLB tokenizer multiple target/source languages within a training batch
|
|
5
|
1591
|
January 10, 2025
|
|
Creating a Custom Token Vocabulary for GPT-2
|
|
1
|
509
|
January 7, 2025
|
|
'NoneType' Object Error and Token Authorization Issue in OpenAI API Integration
|
|
4
|
185
|
January 5, 2025
|
|
Train_from_iterator throwing TypeError: expected string or buffer errir
|
|
2
|
30
|
January 3, 2025
|