SentencePiece to Tokenizers conversion
|
|
0
|
6
|
March 14, 2025
|
Decode token IDs into a list (not a single string)
|
|
4
|
3316
|
March 11, 2025
|
Adapter-aware chat_template
|
|
3
|
91
|
February 21, 2025
|
Fine-tuning whsiper on custom special tokens
|
|
0
|
21
|
February 16, 2025
|
Inconsistent, mojibaked tokenization in some but not all Huggingface tokenizers
|
|
1
|
11
|
February 12, 2025
|
Errors with Tokenizers on Llama
|
|
2
|
148
|
February 8, 2025
|
Character level tokenizer with specific order
|
|
5
|
30
|
February 7, 2025
|
Custom Tokenizer Error - Please Help!
|
|
0
|
16
|
February 7, 2025
|
ValueError: You should supply an encoding or a list of encodings to this method that includes input_ids, but you provided ['pixel_values']
|
|
0
|
43
|
February 4, 2025
|
BPEtokenizer reports error "not valid UTF-8" when processing txt file
|
|
7
|
27
|
January 29, 2025
|
NLLB tokenizer multiple target/source languages within a training batch
|
|
5
|
1294
|
January 10, 2025
|
Creating a Custom Token Vocabulary for GPT-2
|
|
1
|
124
|
January 7, 2025
|
'NoneType' Object Error and Token Authorization Issue in OpenAI API Integration
|
|
4
|
80
|
January 5, 2025
|
Train_from_iterator throwing TypeError: expected string or buffer errir
|
|
2
|
20
|
January 3, 2025
|
What should be processing_class param value of Seq2SeqTrainer for VisionEncoderDecoderModel Finetuning?
|
|
0
|
19
|
January 3, 2025
|
Error creating custom pre_tokenizer
|
|
3
|
29
|
January 2, 2025
|
Adding special tokens to LEDTokenizer
|
|
0
|
33
|
December 25, 2024
|
Byte Level Tokenizer While Training
|
|
0
|
33
|
December 14, 2024
|
Tokenizer: what function removes spaces between '<' and '>'?
|
|
0
|
43
|
December 9, 2024
|
Convert huggingface tokenizer into sentencepiece format
|
|
1
|
504
|
November 27, 2024
|
Issue with Loading Custom Tokenizer: Tokenizer class BaseTokenizer does not exist or is not currently imported Error
|
|
6
|
89
|
November 6, 2024
|
Generate tokenizer.json for Marian(Opus) MT
|
|
2
|
632
|
November 4, 2024
|
Tokenizer method inference
|
|
3
|
31
|
November 2, 2024
|
How to skip tokens from translation?
|
|
2
|
869
|
October 15, 2024
|
Error loading tokenizer: data did not match any variant of untagged enum ModelWrapper at line 1251003 column 3
|
|
3
|
2519
|
October 10, 2024
|
Authorization header is correct, but the token seems invalid
|
|
3
|
135
|
October 10, 2024
|
AutoTokenizer.encode with multiThread and mutliProcess
|
|
2
|
128
|
October 9, 2024
|
Trying to use AutoTokenizer with TensorFlow gives: `ValueError: text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) or `List[List[str]]` (batch of pretokenized examples).`
|
|
11
|
18813
|
October 5, 2024
|
Help to choose decoder for devnagari ocr
|
|
0
|
14
|
September 20, 2024
|
Speed up tokenizer training
|
|
5
|
808
|
September 17, 2024
|