How do I increase max_new_tokens

I fellas I am getting this error message:

Input length of input_ids is 28, but max_length is set to 20. This can lead to unexpected behavior. You should consider increasing max_new_tokens.

The code is:

from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer, TextGenerationPipeline
import torch
from accelerate import init_empty_weights, load_checkpoint_and_dispatch
import transformers

model_dir = './mpt-7b-instruct'

config = AutoConfig.from_pretrained(
  model_dir,
  trust_remote_code=True,
  max_new_tokens=1024
)

with init_empty_weights():
    model = AutoModelForCausalLM.from_config(config, trust_remote_code=True)
    
model.tie_weights()

model = load_checkpoint_and_dispatch(
    model, model_dir, device_map="auto", no_split_module_classes=["MPTBlock"]
)

tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")

pipeline = TextGenerationPipeline(model=model, tokenizer=tokenizer)

answer=pipeline(["Answer the following question:\nQ.My mother has recently been diagnosed with dementia, what support is available for her? \nA."])


print(answer)

Can anyone please let me know where I should add the configuration for max_new_tokens please?
I have already added to the model and the tokenizer but nothing works.
Thanks in advance

Actually I figured it out:

from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer
import torch
from accelerate import init_empty_weights, load_checkpoint_and_dispatch
import transformers
import time
import pdb

start0=time.time()
model_dir = './mpt-7b-instruct'

config = AutoConfig.from_pretrained(
  model_dir,
  trust_remote_code=True,
  max_new_tokens=1024
)

with init_empty_weights():
    model = AutoModelForCausalLM.from_config(config, trust_remote_code=True)
    
model.tie_weights()

model = load_checkpoint_and_dispatch(
    model, model_dir, device_map="auto", no_split_module_classes=["MPTBlock"]
)

tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
#pipeline = TextGenerationPipeline(model=model, tokenizer=tokenizer)

import torch
from transformers import StoppingCriteria, StoppingCriteriaList

# mtp-7b is trained to add "<|endoftext|>" at the end of generations
stop_token_ids = tokenizer.convert_tokens_to_ids(["<|endoftext|>"])

# define custom stopping criteria object
class StopOnTokens(StoppingCriteria):
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
        for stop_id in stop_token_ids:
            if input_ids[0][-1] == stop_id:
                return True
        return False

stopping_criteria = StoppingCriteriaList([StopOnTokens()])

generate_text = transformers.pipeline(
    model=model, tokenizer=tokenizer,
    return_full_text=True,  # langchain expects the full text
    task='text-generation',
    # we pass model parameters here too
    stopping_criteria=stopping_criteria,  # without this model will ramble
    temperature=0.1,  # 'randomness' of outputs, 0.0 is the min and 1.0 the max
    top_p=0.15,  # select from top tokens whose probability add up to 15%
    top_k=0,  # select from top 0 tokens (because zero, relies on top_p)
    max_new_tokens=500,  # mex number of tokens to generate in the output
    repetition_penalty=1.1  # without this output begins repeating
)
diff0=time.time()-start0
print(diff0, "\n")

start1=time.time()
res=generate_text("Answer the following question:\nQ.My mother has recently been diagnosed with dementia, what support is available for her? \nA.")
print(res[0]["generated_text"])
diff1=time.time()-start1
print(diff1,"\n")

start2=time.time()
res=generate_text("Answer the following question:\nQ.Where can I share my story (about looking after someone with cognitive problems) and hear from others?  \nA.")
print(res[0]["generated_text"])
diff2=time.time()-start2
print(diff2)

pdb.set_trace()

Results

This was tested on
The duration to load the model was 370.73 s

Answer the following question:
Q.My mother has recently been diagnosed with dementia, what support is available for her?
A.There are a number of organisations that can provide advice and guidance to you as a carer or family member looking after someone who has dementia. The Alzheimer’s Society provides information on all aspects of living with dementia including practical tips about how best to communicate with your loved one when they have dementia. They also offer a range of support groups where people affected by dementia can meet others in similar situations to share experiences and gain emotional support from each other.

Duration: 354.90 s

Answer the following question:

Q.Where can I share my story (about looking after someone with cognitive problems) and hear from others?
A.You could join a Facebook group called ‘Carers of People With Dementia’

Duration: 62.50 s

1 Like

Francisco,

Can you provide me where you found this code on HuggingFace? If you could let me know, I would appreciate it. I am trying to do Question Answering using LangChain via HuggingFace. Absolutely frustrated with the process. I am learning alot but I have no idea what’s going on.

Ganesh

1 Like

@gkrishnan I’m late to the post but you can always manually pass in the model/pipeline:

from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
from langchain.llms import HuggingFacePipeline

model = AutoModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)
gen = pipeline('text-generation', model=model, tokenizer=tokenizer, max_new_tokens=200)

llama_llm = HuggingFacePipeline(pipeline=gen)