Hey all,
I am trying to use one of the Llama2 models to generate text (summarise docs). This means that I am dealing with long texts that need to be truncated. Upon using the Pipeline API I have discovered some weird behaviour.
tokenizer = AutoTokenizer.from_pretrained(
local_model_dir,
padding_side="right",
model_max_length=MAX_LENGTH,
truncation=True
)
# error in source, context should be 4k
upldated_config = LlamaConfig(max_position_embeddings=MAX_POS_EMBEDDINGS)
# 8 bit precision to fit GPU -> maybe try 4 bits?
model = AutoModelForCausalLM.from_pretrained(
local_model_dir,
torch_dtype=torch.float16,
trust_remote_code=True,
device_map="auto",
load_in_8bit=True,
config=upldated_config
)
pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
return_full_text=False,
)
it seems to me that when I call pipeline(long_input)
the input text is not correctly truncated. This is checked by calling pipeline.pre_process(long_input)
which results in the following message:
Token indices sequence length is longer than the specified maximum sequence length for this model (12351 > 3896).
I have come to understand that truncation is only done when calling tokenizer(long_input, truncation=True, max_length=MAX_LENGTH)
However when i try to pass truncation to the Pipeline like this pipeline(long_input, truncation=True)
. It also throws as error:
The following model_kwargs are not used by the model: [‘truncation’] (note: typos in the generate arguments will also show up in this list)
So this all begs the question: How do i make truncation happen while using a pipeline? To me it seems inuitive that when Truncation=True while initiating the tokenizer, and adding it to the pipeline. All downstream texts should be tokenized.