Cannot create an identical PretrainedTokenizerFast object from a Tokenizer created by tokenizers library
|
|
1
|
1096
|
August 30, 2021
|
Index of wordpieces (subwords) after tokenization by transformers
|
|
0
|
701
|
August 28, 2021
|
A problem about FutureWarning?
|
|
0
|
1248
|
August 18, 2021
|
Extracting embedding values of NLP pertained models from tokenized strings
|
|
3
|
2230
|
August 18, 2021
|
Tokenization in a NER context
|
|
5
|
5785
|
August 11, 2021
|
Unable to convert output to interpretable format
|
|
0
|
364
|
July 31, 2021
|
BpeTrainer implementation in Python
|
|
0
|
379
|
July 23, 2021
|
MBart50Tokenizer vs XLMRobertaTokenizer
|
|
0
|
487
|
July 19, 2021
|
Why multilingual BERT tokenizer doesn't remove accent markers?
|
|
0
|
918
|
July 18, 2021
|
TypeError when loading tokenizer with from_pretrained method for bart-large-mnli model
|
|
1
|
1121
|
July 8, 2021
|
Is it okay to split ids sequence when it is encoded using Byte-level BPE
|
|
0
|
342
|
July 7, 2021
|
Using truncated fragments as input samples in training
|
|
3
|
688
|
July 1, 2021
|
Using whitespace tokenizer for training models
|
|
1
|
3268
|
June 6, 2021
|
Save custom components
|
|
0
|
334
|
May 29, 2021
|
How to see contents of a normalizer
|
|
0
|
302
|
May 7, 2021
|
Newbie: Main difference between tokenizers?
|
|
0
|
849
|
May 6, 2021
|
Can't load tokenizer for 'sshleifer/student_blarge_12_3'
|
|
0
|
333
|
May 6, 2021
|
How to create a Huggingface tokenizer from a non-Huggingface tokenizer?
|
|
0
|
530
|
May 4, 2021
|
Add new tokens and learn the embeddings of the new tokens and keeping all the other parametes frozen
|
|
0
|
471
|
April 30, 2021
|
How do you use SentencePiece for BPE of sequences with no whitespace
|
|
1
|
2108
|
April 29, 2021
|
BOS tokens for mBERT tokenizer
|
|
1
|
634
|
April 14, 2021
|
BertTokenizerFast for stsb-xlm-r-multilingual model
|
|
3
|
662
|
April 8, 2021
|
Skip-gram tokens
|
|
0
|
370
|
April 4, 2021
|
Using a BertWordPieceTokenizer trained from scratch from transformers
|
|
2
|
5065
|
March 26, 2021
|
Questions on model's tokens
|
|
0
|
602
|
March 24, 2021
|
Space token ' ' cannot be add when is_split_into_words = True
|
|
1
|
461
|
March 11, 2021
|
Are special_tokens the only tokens guaranteed to be atomic?
|
|
0
|
378
|
March 3, 2021
|
Does AutoTokenizer.from_pretrained add [cls] tokens?
|
|
7
|
5322
|
March 2, 2021
|
BertTokenizer's encode_plus returns 2d tensor when printing 'input_ids'/ 'attention_mask'
|
|
0
|
395
|
February 7, 2021
|
Tunning tokenizer on my own dataset
|
|
0
|
723
|
January 25, 2021
|