Where the introduction of tokenizers.implementations?
|
|
0
|
163
|
May 7, 2023
|
How to return custom `token_type_ids` or other values from a tokenizer?
|
|
0
|
491
|
May 3, 2023
|
Easy way to compare tokenizers
|
|
0
|
235
|
May 1, 2023
|
Issue with XLM-RoBERTa tokenizer
|
|
0
|
233
|
May 1, 2023
|
Unable to load image using llama-index
|
|
0
|
1154
|
May 1, 2023
|
Help defining tokenizer
|
|
0
|
215
|
April 28, 2023
|
Token Offsets in Rust vs. Python
|
|
1
|
267
|
April 27, 2023
|
“OSError: Model name './XX' was not found in tokenizers model name list” - cannot load custom tokenizer in Transformers
|
|
14
|
6404
|
April 25, 2023
|
Converting JSON/dict to flatten string with indicator tokens
|
|
1
|
294
|
April 21, 2023
|
Train Retry Tokenizer
|
|
0
|
207
|
April 18, 2023
|
Pretokenise on punctuation except hyphens
|
|
0
|
237
|
April 15, 2023
|
Tokenizer Trainer Crashing
|
|
0
|
536
|
April 15, 2023
|
Tokenizer extremely slow when deployed to a container
|
|
0
|
1019
|
April 14, 2023
|
Dealing with Decimal and Fractions
|
|
1
|
1305
|
October 27, 2022
|
ONNX T5 - Decoding seq2seq tokens
|
|
0
|
299
|
April 12, 2023
|
`add_tokens` with argument `special_tokens=True` vs `add_special_tokens`
|
|
0
|
269
|
April 5, 2023
|
Unable to upload custom Pytorch model in huggingface
|
|
0
|
246
|
April 4, 2023
|
How long to expect training to take, and guidance on subset size?
|
|
0
|
1021
|
April 3, 2023
|
RuntimeError: Cannot re-initialize CUDA in forked subprocess
|
|
2
|
2427
|
April 3, 2023
|
Overflowing Tokens in MarkupLM
|
|
0
|
332
|
March 31, 2023
|
I get the predicted token as ` े` . What am I doing wrong?
|
|
1
|
567
|
March 27, 2023
|
<unk> token in the output instead curly braces
|
|
0
|
403
|
March 25, 2023
|
How to add a new token without expanding the vocabulary
|
|
0
|
559
|
March 24, 2023
|
Does the ByteLevelBPETokenizer need to be wrapped in a normal Tokenizer?
|
|
0
|
1346
|
March 18, 2023
|
What is required to create a fast tokenizer? For example for a Marian model
|
|
0
|
260
|
March 16, 2023
|
GPT2Tokenizer.decode maps unicode sequences to the same string '�'
|
|
3
|
851
|
March 15, 2023
|
Issue with Tokenizer
|
|
0
|
509
|
March 14, 2023
|
Tokenizing my novel for GPT model
|
|
0
|
745
|
March 10, 2023
|
How to add additional custom pre-tokenization processing?
|
|
6
|
4020
|
March 7, 2023
|
Customize FlauBERT tokenizer to split line breaks
|
|
0
|
236
|
March 4, 2023
|