Tokenizing two sentences with the tokenizer

nielsr · October 18, 2021, 7:33am

Hi,

As explained in the docs, you can specify several possible strategies for the truncation parameter, including 'only_first'. Also, the encode_plus method is outdated actually. It is recommended to just call the tokenizer, both on single sentence or pair of sentences. TLDR:

inputs = tokenizer(text_a, text_b, truncation='only_first', max_length=max_length)

Topic		Replies	Views
How does transformers.pipeline works for NLI? Beginners	4	1589	May 5, 2021
How truncation works when applying BERT tokenizer on the batch of sentence pairs in HuggingFace? 🤗Tokenizers	0	936	May 15, 2022
How padding in huggingface tokenizer works? 🤗Tokenizers	4	6730	November 22, 2021
When using the API, how can I limit the lenght of the answer and still get complete sentences? Beginners	1	691	December 23, 2023
Changing Tokenizer's max_length gets weird result Beginners	2	429	May 17, 2022

Tokenizing two sentences with the tokenizer

Related topics