`add_prefix_space=True` option for the BPE tokenizer

h56cho · October 19, 2020, 7:21pm

Hello,
I understand that when I add the add_prefix_space=True option in the BPE tokenizer statement, the tokenizer will add a space in the beginning of every sequence.

Is there some specific advantages of using the add_prefix_space=True option for BPE tokenizer (compared to when I don’t use the option)? All my sequences start without a space in the beginning.

Thanks,

Topic		Replies	Views
BPE tokenizers and spaces before words 🤗Transformers	4	26368	September 8, 2023
How to make tokenizer add the spaces correctly when decoding a sequence when set add_prefix_space=False 🤗Tokenizers	0	568	October 9, 2023
Use a pretrained ByteLevelBPETokenizer on text 🤗Tokenizers	1	3753	July 17, 2020
How to reconstruct a sentence after it is encoded using BPE? Beginners	2	822	April 18, 2023
Issues with BPE tokenizer 🤗Tokenizers	2	270	January 24, 2024

`add_prefix_space=True` option for the BPE tokenizer

Related topics