|
Train_from_iterator throwing TypeError: expected string or buffer errir
|
|
2
|
30
|
January 3, 2025
|
|
What should be processing_class param value of Seq2SeqTrainer for VisionEncoderDecoderModel Finetuning?
|
|
0
|
33
|
January 3, 2025
|
|
Error creating custom pre_tokenizer
|
|
3
|
76
|
January 2, 2025
|
|
Adding special tokens to LEDTokenizer
|
|
0
|
46
|
December 25, 2024
|
|
Byte Level Tokenizer While Training
|
|
0
|
77
|
December 14, 2024
|
|
Tokenizer: what function removes spaces between '<' and '>'?
|
|
0
|
62
|
December 9, 2024
|
|
Convert huggingface tokenizer into sentencepiece format
|
|
1
|
672
|
November 27, 2024
|
|
Issue with Loading Custom Tokenizer: Tokenizer class BaseTokenizer does not exist or is not currently imported Error
|
|
6
|
344
|
November 6, 2024
|
|
Generate tokenizer.json for Marian(Opus) MT
|
|
2
|
662
|
November 4, 2024
|
|
Tokenizer method inference
|
|
3
|
55
|
November 2, 2024
|
|
How to skip tokens from translation?
|
|
2
|
905
|
October 15, 2024
|
|
Error loading tokenizer: data did not match any variant of untagged enum ModelWrapper at line 1251003 column 3
|
|
3
|
4085
|
October 10, 2024
|
|
Authorization header is correct, but the token seems invalid
|
|
3
|
202
|
October 10, 2024
|
|
AutoTokenizer.encode with multiThread and mutliProcess
|
|
2
|
412
|
October 9, 2024
|
|
Trying to use AutoTokenizer with TensorFlow gives: `ValueError: text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) or `List[List[str]]` (batch of pretokenized examples).`
|
|
11
|
20710
|
October 5, 2024
|
|
Help to choose decoder for devnagari ocr
|
|
0
|
21
|
September 20, 2024
|
|
Speed up tokenizer training
|
|
5
|
1543
|
September 17, 2024
|
|
Cannot load tokenizer for llama2
|
|
6
|
7259
|
September 13, 2024
|
|
What is based model of XLM-RoBERTa Tokenizer? SenetencePiece? XLNetTokenizer
|
|
0
|
43
|
September 12, 2024
|
|
Tokenization compared to sentencepiece
|
|
0
|
114
|
September 11, 2024
|
|
Tokenizer Error [AGAIN!]
|
|
0
|
59
|
September 10, 2024
|
|
Decoding sequence of tokens produces question marks instead of actual tokens
|
|
1
|
35
|
September 3, 2024
|
|
Chat_template is not set & throwing error
|
|
3
|
14888
|
August 31, 2024
|
|
Memory leaks when training Gemma or Phi 3 and 3.5 tokenizer
|
|
0
|
92
|
August 29, 2024
|
|
What does "trim_offsets" do in tokenizer post-processor?
|
|
0
|
63
|
August 25, 2024
|
|
How to train a LlamaTokenizer?
|
|
22
|
4146
|
August 20, 2024
|
|
Issue with XLM-RoBERTa tokenizer
|
|
1
|
310
|
August 15, 2024
|
|
Adding tokens, but tokenizer doesn't use them
|
|
1
|
434
|
August 14, 2024
|
|
Can I retrain GPT-2 tokeniser on Chinese data and use it with GPT-2 XL or other models to create a Chinese-speaking model?
|
|
0
|
28
|
August 14, 2024
|
|
Encoding and then decodeing text is not equal
|
|
2
|
240
|
August 12, 2024
|