🤗Tokenizers

Topic	Replies	Views	Activity
pyo3_runtime.PanicException: likelihood is NAN. Input sentence may be too long	1	1195	May 27, 2022
Pytorch_model.bin not working because of lfs	2	821	May 25, 2022
Tokenizer ignores repeated whitespaces	3	3371	May 19, 2022
How truncation works when applying BERT tokenizer on the batch of sentence pairs in HuggingFace?	0	940	May 15, 2022
How to save a tokenizer only consisting of added tokens	0	840	May 11, 2022
Passing list of inputs to tokenize	1	1345	May 9, 2022
Issues with Data Collator and Tokenizing with NER Datasets	1	2534	May 9, 2022
How to perform tokenization on an ONNX model in JS?	0	842	May 6, 2022
Using the Tokenizers library in a Unity project	0	682	May 4, 2022
Further pre-training the tokenizer?	0	827	April 30, 2022
Error when doing tokenization	0	921	April 29, 2022
How to decode with spaces?	0	1880	April 28, 2022
Show Submodels of PegasusTokenizer	1	632	April 28, 2022
Load SentencePieceBPETokenizer in TF	0	1009	April 27, 2022
Best way to mask a multi-token word when using `.*ForMaskedLM` models	2	2309	April 4, 2022
Does a tokenizer keep the mapping between my labels to their encoding?	3	2204	April 4, 2022
What is Wav2Vec2FeatureExtractor doing?	0	700	April 3, 2022
What does this warning mean? -overflowing tokens are not returned for the setting you have chosen	1	5415	March 30, 2022
How can I make sure Tokenizer pads to a fixed length?	2	2139	March 29, 2022
Issue with Decoding in HuggingFace	2	3897	March 24, 2022
ValueError: Unable to create tensor for 1 dataset but not the other of same type	1	999	March 23, 2022
Disabling addition of CLS from BERT tokenizer	5	1809	March 11, 2022
Tokenized sequence lengths	6	2077	March 10, 2022
Finetuning GPT-J6B for custom dataset	1	1088	March 6, 2022
Training tokenizer takes too much RAM	1	1340	February 21, 2022
How to "further pretrain" a tokenizer (do I need to do so?)	5	4435	February 20, 2022
Run_seq2seq_qa.py: Column 3 named labels expected length 1007 but got length 1000	1	2528	February 17, 2022
Issues with offset_mapping values	4	4546	February 15, 2022
How would you train a sentencepiece BPE tokenizer on this language with 400 "characters"?	0	2977	February 13, 2022
All my sequences get tokenized the same	2	610	February 12, 2022