🤗Tokenizers

Topic	Replies	Views	Activity
DNA long sequence tokenization	2	2788	August 6, 2023
SentencePiece tokenizer encodes to unknown token	0	898	August 2, 2023
Tokenizer behaviour with pipeline	0	933	August 1, 2023
Load tokenizer from file : Exception: data did not match any variant of untagged enum ModelWrapper	3	9528	August 1, 2023
ArrowInvalid: Column 3 named attention_mask expected length 1000 but got length 1076	3	2529	July 26, 2023
Discussing the Pros and Cons of Using add_tokens vs. Byte Pair Encoding (BPE) for Adding New Tokens to an Existing RoBERTa Model	0	777	July 14, 2023
Initialize Vocabulary for Unigram Tokenizer	0	299	July 11, 2023
Make correct padding for text generation with GPT-NEO	0	824	July 5, 2023
How does a tokenzier (eg., AutoTokenizer) generate word_ids intergers?	0	562	June 26, 2023
Seeking an end-to-end example of grouping, tokenization and padding to construct preprocessed data in HF	0	394	June 26, 2023
Writing custom tokenizer and wrapping it in tokenizer object	2	808	June 26, 2023
Tokenizer for German lang	0	600	June 22, 2023
Chunk tokens into desired chunk length without simply getting rid of rest of tokens	0	644	June 15, 2023
Padding not transferring when loading a tokenizer trained via the tokenizers library into transformers	0	498	June 12, 2023
LlamaTokenizerFast returns token_type_ids but the forward pass of the LlamaModel does not receive token_type_ids	1	771	June 9, 2023
GPT2Tokenizer not working in Kaggle Notebook	0	384	May 30, 2023
Exploring the Majestic Temples in Karnataka	0	294	May 25, 2023
Tokenizer producing token index greater than size of the dictionary	0	296	May 15, 2023
How to instantiate a XLMRobertaTokenizer object using a locally trained SentencePiece tokenizer	0	294	May 14, 2023
How to create a HF tokenizer's vocab file from a BPE model's merges.txt file?	0	477	May 13, 2023
Scala/JVM Bindings for Tokenizers	0	510	May 10, 2023
Tokenizers Wheel Takes Forever to Build	1	3039	May 8, 2023
Where the introduction of tokenizers.implementations?	0	184	May 7, 2023
How to return custom `token_type_ids` or other values from a tokenizer?	0	688	May 3, 2023
Easy way to compare tokenizers	0	302	May 1, 2023
Unable to load image using llama-index	0	1395	May 1, 2023
Help defining tokenizer	0	283	April 28, 2023
Token Offsets in Rust vs. Python	1	370	April 27, 2023
“OSError: Model name './XX' was not found in tokenizers model name list” - cannot load custom tokenizer in Transformers	14	6913	April 25, 2023
Converting JSON/dict to flatten string with indicator tokens	1	327	April 21, 2023