Get vocabulary tokens in order to exclude them from generate function
|
|
2
|
1398
|
August 1, 2022
|
Avoid creating certain tokens when training a tokenizer
|
|
0
|
235
|
July 26, 2022
|
Error finetuning XLM-RoBERTa-Large when training
|
|
2
|
252
|
July 15, 2022
|
HuggingFace BPE Trainer Error - Training Tokenizer
|
|
1
|
1118
|
July 14, 2022
|
Word_to_tokens() and word_ids() ---- microsoft/deberta-v2/v3
|
|
2
|
306
|
July 14, 2022
|
No PreTrainedTokenizerFast for Deberta-V3, no doc_stride
|
|
0
|
251
|
July 13, 2022
|
Tokenizer from own vocab
|
|
0
|
189
|
July 11, 2022
|
Tokenizer dataset is very slow
|
|
1
|
565
|
June 28, 2022
|
No labels column for tokenized data
|
|
2
|
432
|
June 27, 2022
|
Programmatic way to Tokenization on Custom Text Columns
|
|
0
|
314
|
June 27, 2022
|
Bug in Offset generation for Rupee symbol
|
|
0
|
285
|
June 27, 2022
|
How to handle parenthesis, quotation marks, \n etc when creating tokenizer from scratch
|
|
0
|
348
|
June 26, 2022
|
EM training on unigram tokenizer taking way longer than predicted
|
|
0
|
284
|
June 23, 2022
|
Training unigram on long sequences
|
|
4
|
517
|
June 23, 2022
|
Sliding window for Long Documents
|
|
0
|
433
|
June 20, 2022
|
Issue with post-processing
|
|
1
|
751
|
June 15, 2022
|
FutureWarning about BertTokenizer.from_pretrained() at latest version
|
|
0
|
499
|
June 6, 2022
|
Enhaced word_ids() API for Chinese or CJK languages?
|
|
0
|
362
|
June 2, 2022
|
Importing tokenizers version >0.10.3 fails due to openssl
|
|
3
|
2841
|
June 2, 2022
|
Lower case with input ids
|
|
0
|
384
|
May 29, 2022
|
Dialogue classification
|
|
0
|
415
|
May 28, 2022
|
Multilang bert vs translating to english
|
|
0
|
413
|
May 28, 2022
|
pyo3_runtime.PanicException: likelihood is NAN. Input sentence may be too long
|
|
1
|
628
|
May 27, 2022
|
Pytorch_model.bin not working because of lfs
|
|
2
|
532
|
May 25, 2022
|
Tokenizer ignores repeated whitespaces
|
|
3
|
655
|
May 19, 2022
|
How truncation works when applying BERT tokenizer on the batch of sentence pairs in HuggingFace?
|
|
0
|
569
|
May 15, 2022
|
How to save a tokenizer only consisting of added tokens
|
|
0
|
526
|
May 11, 2022
|
Passing list of inputs to tokenize
|
|
1
|
560
|
May 9, 2022
|
Issues with Data Collator and Tokenizing with NER Datasets
|
|
1
|
708
|
May 9, 2022
|
How to perform tokenization on an ONNX model in JS?
|
|
0
|
508
|
May 6, 2022
|