🤗Tokenizers

Topic	Replies	Views	Activity
Cannot create an identical PretrainedTokenizerFast object from a Tokenizer created by tokenizers library	1	1096	August 30, 2021
Index of wordpieces (subwords) after tokenization by transformers	0	701	August 28, 2021
A problem about FutureWarning？	0	1248	August 18, 2021
Extracting embedding values of NLP pertained models from tokenized strings	3	2230	August 18, 2021
Tokenization in a NER context	5	5785	August 11, 2021
Unable to convert output to interpretable format	0	364	July 31, 2021
BpeTrainer implementation in Python	0	379	July 23, 2021
MBart50Tokenizer vs XLMRobertaTokenizer	0	487	July 19, 2021
Why multilingual BERT tokenizer doesn't remove accent markers?	0	918	July 18, 2021
TypeError when loading tokenizer with from_pretrained method for bart-large-mnli model	1	1121	July 8, 2021
Is it okay to split ids sequence when it is encoded using Byte-level BPE	0	342	July 7, 2021
Using truncated fragments as input samples in training	3	688	July 1, 2021
Using whitespace tokenizer for training models	1	3268	June 6, 2021
Save custom components	0	334	May 29, 2021
How to see contents of a normalizer	0	302	May 7, 2021
Newbie: Main difference between tokenizers?	0	849	May 6, 2021
Can't load tokenizer for 'sshleifer/student_blarge_12_3'	0	333	May 6, 2021
How to create a Huggingface tokenizer from a non-Huggingface tokenizer?	0	530	May 4, 2021
Add new tokens and learn the embeddings of the new tokens and keeping all the other parametes frozen	0	471	April 30, 2021
How do you use SentencePiece for BPE of sequences with no whitespace	1	2108	April 29, 2021
BOS tokens for mBERT tokenizer	1	634	April 14, 2021
BertTokenizerFast for stsb-xlm-r-multilingual model	3	662	April 8, 2021
Skip-gram tokens	0	370	April 4, 2021
Using a BertWordPieceTokenizer trained from scratch from transformers	2	5065	March 26, 2021
Questions on model's tokens	0	602	March 24, 2021
Space token ' ' cannot be add when is_split_into_words = True	1	461	March 11, 2021
Are special_tokens the only tokens guaranteed to be atomic?	0	378	March 3, 2021
Does AutoTokenizer.from_pretrained add [cls] tokens?	7	5322	March 2, 2021
BertTokenizer's encode_plus returns 2d tensor when printing 'input_ids'/ 'attention_mask'	0	395	February 7, 2021
Tunning tokenizer on my own dataset	0	723	January 25, 2021