🤗Tokenizers

Topic	Replies	Views	Activity
Add BOS and EOS when encoding a sentence	2	14565	August 22, 2022
Customization of Wav2Vec2CTCTokenizer with rules	0	397	August 22, 2022
Customized tokenization files in run_clm script	3	697	August 18, 2022
Using customized algorithm	0	321	August 17, 2022
Issue with Flaubert Tokenizer as word_ids() method is not available for NER Task	1	1400	August 15, 2022
Word_ids not working with deberta_v2	1	1306	August 12, 2022
How to tokenize large contexts without running out of memory	2	1606	August 8, 2022
Does Deberta tokenizer use wordpiece?	0	558	August 6, 2022
Get vocabulary tokens in order to exclude them from generate function	2	2644	August 1, 2022
Avoid creating certain tokens when training a tokenizer	0	602	July 26, 2022
Error finetuning XLM-RoBERTa-Large when training	2	377	July 15, 2022
HuggingFace BPE Trainer Error - Training Tokenizer	1	2994	July 14, 2022
Word_to_tokens() and word_ids() ---- microsoft/deberta-v2/v3	2	488	July 14, 2022
No PreTrainedTokenizerFast for Deberta-V3, no doc_stride	0	914	July 13, 2022
Tokenizer from own vocab	0	456	July 11, 2022
No labels column for tokenized data	2	2225	June 27, 2022
Programmatic way to Tokenization on Custom Text Columns	0	568	June 27, 2022
Bug in Offset generation for Rupee symbol	0	413	June 27, 2022
How to handle parenthesis, quotation marks, \n etc when creating tokenizer from scratch	0	696	June 26, 2022
EM training on unigram tokenizer taking way longer than predicted	0	480	June 23, 2022
Training unigram on long sequences	4	1275	June 23, 2022
Issue with post-processing	1	1102	June 15, 2022
FutureWarning about BertTokenizer.from_pretrained() at latest version	0	1242	June 6, 2022
Enhaced word_ids() API for Chinese or CJK languages?	0	458	June 2, 2022
Importing tokenizers version >0.10.3 fails due to openssl	3	6560	June 2, 2022
Lower case with input ids	0	705	May 29, 2022
Dialogue classification	0	666	May 28, 2022
Multilang bert vs translating to english	0	608	May 28, 2022
pyo3_runtime.PanicException: likelihood is NAN. Input sentence may be too long	1	1191	May 27, 2022
Pytorch_model.bin not working because of lfs	2	818	May 25, 2022