Why Bert-chinese use do_lower_case=False?
|
|
0
|
485
|
December 24, 2020
|
Bug with tokernizer's offset mapping for NER problems?
|
|
3
|
7225
|
December 23, 2020
|
BERT WordPiece Tokenizer: some matras missing after tokenization for Hindi Language #572
|
|
0
|
489
|
December 23, 2020
|
Error with <|endoftext|> in Tokenizer GPT2
|
|
2
|
7506
|
December 16, 2020
|
Build a RoBERTa tokenizer from scratch
|
|
5
|
3359
|
December 12, 2020
|
Couldn't instantiate the backend tokenizer
|
|
0
|
2302
|
December 7, 2020
|
Bypassing tokenizers
|
|
2
|
412
|
November 23, 2020
|
Tokenizing Domain Specific Text
|
|
5
|
1471
|
November 20, 2020
|
Issue with tokenizer.tokenize
|
|
3
|
504
|
November 16, 2020
|
Where to find the "wiki-big.train.raw" data as mentioned in the snippet for tokenizers 0.9?
|
|
2
|
1046
|
October 29, 2020
|
Change bpe-dropout value on the fly?
|
|
0
|
435
|
October 24, 2020
|
Loading pretrained SentencePiece tokenizer from Fairseq
|
|
5
|
6455
|
October 21, 2020
|
What does `tokenizers.normalizer.normalize` do?
|
|
5
|
3549
|
October 12, 2020
|
Automatic sentence segmentation and encoding
|
|
0
|
842
|
October 12, 2020
|
How to truncate from the head in AutoTokenizer?
|
|
2
|
4671
|
September 26, 2020
|
How much memory is needed for training ByteLevelBPETokenizer?
|
|
3
|
1513
|
September 18, 2020
|
How to make tokenizer convert subword token to an independent token?
|
|
4
|
624
|
September 9, 2020
|
Using a pretrained tokenizer vs training a one from scratch
|
|
1
|
876
|
August 21, 2020
|
Masking Probability
|
|
4
|
802
|
August 20, 2020
|
Tokenizer not found
|
|
0
|
320
|
August 18, 2020
|
Add new tokens for subwords
|
|
9
|
6852
|
August 11, 2020
|
Token alignment for word-level tasks
|
|
1
|
2543
|
August 5, 2020
|
ByteLevelBPETokenizer inconsistent behavior
|
|
0
|
410
|
July 23, 2020
|
Use a pretrained ByteLevelBPETokenizer on text
|
|
1
|
3856
|
July 17, 2020
|
Continuation token in pertained tokenizer bert-base-chinese
|
|
0
|
525
|
July 11, 2020
|
Tokenizers v0.8.0 is out!
|
|
0
|
1514
|
July 7, 2020
|
About the Tokenizers category
|
|
1
|
313
|
July 7, 2020
|