Import distilbert-base-uncased tokenizer to an android app along with the tflite model
|
|
0
|
8
|
January 16, 2021
|
ERROR?why encoding [MASK] before '.' would gain a idx 13?
|
|
4
|
48
|
December 28, 2020
|
Why Bert-chinese use do_lower_case=False?
|
|
0
|
27
|
December 24, 2020
|
Bug with tokernizer's offset mapping for NER problems?
|
|
3
|
39
|
December 23, 2020
|
BERT WordPiece Tokenizer: some matras missing after tokenization for Hindi Language #572
|
|
0
|
19
|
December 23, 2020
|
“OSError: Model name './XX' was not found in tokenizers model name list” - cannot load custom tokenizer in Transformers
|
|
10
|
69
|
December 22, 2020
|
Error with new tokenizers (URGENT!)
|
|
5
|
130
|
December 16, 2020
|
Error with <|endoftext|> in Tokenizer GPT2
|
|
2
|
42
|
December 16, 2020
|
Build a RoBERTa tokenizer from scratch
|
|
5
|
81
|
December 12, 2020
|
Issue with post-processing
|
|
0
|
33
|
December 8, 2020
|
Couldn't instantiate the backend tokenizer
|
|
0
|
168
|
December 7, 2020
|
OSError: Model name 'gpt2' was not found in tokenizers model name list (gpt2,...)
|
|
5
|
113
|
November 24, 2020
|
Bypassing tokenizers
|
|
2
|
47
|
November 23, 2020
|
Tokenizing Domain Specific Text
|
|
5
|
120
|
November 20, 2020
|
Tokenizer splits up pre-split tokens
|
|
4
|
88
|
November 18, 2020
|
Issue with tokenizer.tokenize
|
|
3
|
70
|
November 16, 2020
|
How do you use SentencePiece for BPE of sequences with no whitespace
|
|
0
|
61
|
November 4, 2020
|
Tokenizer taking extremely long time to train
|
|
0
|
50
|
November 3, 2020
|
Where to find the "wiki-big.train.raw" data as mentioned in the snippet for tokenizers 0.9?
|
|
2
|
53
|
October 29, 2020
|
Change bpe-dropout value on the fly?
|
|
0
|
48
|
October 24, 2020
|
Loading pretrained SentencePiece tokenizer from Fairseq
|
|
5
|
236
|
October 21, 2020
|
How to add additional custom pre-tokenization processing?
|
|
2
|
122
|
October 21, 2020
|
Speed up Longformer Tokenizer
|
|
2
|
86
|
October 18, 2020
|
What does `tokenizers.normalizer.normalize` do?
|
|
5
|
141
|
October 12, 2020
|
Automatic sentence segmentation and encoding
|
|
0
|
71
|
October 12, 2020
|
How to truncate from the head in AutoTokenizer?
|
|
2
|
213
|
September 26, 2020
|
How much memory is needed for training ByteLevelBPETokenizer?
|
|
3
|
221
|
September 18, 2020
|
How to make tokenizer convert subword token to an independent token?
|
|
4
|
153
|
September 9, 2020
|
How to know if a subtoken is a word or part of a word?
|
|
8
|
329
|
September 1, 2020
|
Using a pretrained tokenizer vs training a one from scratch
|
|
1
|
152
|
August 21, 2020
|