How do I increase max_new_tokens

franciscolemos · June 13, 2023, 3:44pm

I fellas I am getting this error message:

Input length of input_ids is 28, but max_length is set to 20. This can lead to unexpected behavior. You should consider increasing max_new_tokens.

The code is:

from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer, TextGenerationPipeline
import torch
from accelerate import init_empty_weights, load_checkpoint_and_dispatch
import transformers

model_dir = './mpt-7b-instruct'

config = AutoConfig.from_pretrained(
  model_dir,
  trust_remote_code=True,
  max_new_tokens=1024
)

with init_empty_weights():
    model = AutoModelForCausalLM.from_config(config, trust_remote_code=True)
    
model.tie_weights()

model = load_checkpoint_and_dispatch(
    model, model_dir, device_map="auto", no_split_module_classes=["MPTBlock"]
)

tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")

pipeline = TextGenerationPipeline(model=model, tokenizer=tokenizer)

answer=pipeline(["Answer the following question:\nQ.My mother has recently been diagnosed with dementia, what support is available for her? \nA."])


print(answer)

Can anyone please let me know where I should add the configuration for max_new_tokens please?
I have already added to the model and the tokenizer but nothing works.
Thanks in advance

franciscolemos · June 14, 2023, 8:45am

Actually I figured it out:

from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer
import torch
from accelerate import init_empty_weights, load_checkpoint_and_dispatch
import transformers
import time
import pdb

start0=time.time()
model_dir = './mpt-7b-instruct'

config = AutoConfig.from_pretrained(
  model_dir,
  trust_remote_code=True,
  max_new_tokens=1024
)

with init_empty_weights():
    model = AutoModelForCausalLM.from_config(config, trust_remote_code=True)
    
model.tie_weights()

model = load_checkpoint_and_dispatch(
    model, model_dir, device_map="auto", no_split_module_classes=["MPTBlock"]
)

tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
#pipeline = TextGenerationPipeline(model=model, tokenizer=tokenizer)

import torch
from transformers import StoppingCriteria, StoppingCriteriaList

# mtp-7b is trained to add "<|endoftext|>" at the end of generations
stop_token_ids = tokenizer.convert_tokens_to_ids(["<|endoftext|>"])

# define custom stopping criteria object
class StopOnTokens(StoppingCriteria):
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
        for stop_id in stop_token_ids:
            if input_ids[0][-1] == stop_id:
                return True
        return False

stopping_criteria = StoppingCriteriaList([StopOnTokens()])

generate_text = transformers.pipeline(
    model=model, tokenizer=tokenizer,
    return_full_text=True,  # langchain expects the full text
    task='text-generation',
    # we pass model parameters here too
    stopping_criteria=stopping_criteria,  # without this model will ramble
    temperature=0.1,  # 'randomness' of outputs, 0.0 is the min and 1.0 the max
    top_p=0.15,  # select from top tokens whose probability add up to 15%
    top_k=0,  # select from top 0 tokens (because zero, relies on top_p)
    max_new_tokens=500,  # mex number of tokens to generate in the output
    repetition_penalty=1.1  # without this output begins repeating
)
diff0=time.time()-start0
print(diff0, "\n")

start1=time.time()
res=generate_text("Answer the following question:\nQ.My mother has recently been diagnosed with dementia, what support is available for her? \nA.")
print(res[0]["generated_text"])
diff1=time.time()-start1
print(diff1,"\n")

start2=time.time()
res=generate_text("Answer the following question:\nQ.Where can I share my story (about looking after someone with cognitive problems) and hear from others?  \nA.")
print(res[0]["generated_text"])
diff2=time.time()-start2
print(diff2)

pdb.set_trace()

Results

This was tested on
The duration to load the model was 370.73 s

Answer the following question:
Q.My mother has recently been diagnosed with dementia, what support is available for her?
A.There are a number of organisations that can provide advice and guidance to you as a carer or family member looking after someone who has dementia. The Alzheimer’s Society provides information on all aspects of living with dementia including practical tips about how best to communicate with your loved one when they have dementia. They also offer a range of support groups where people affected by dementia can meet others in similar situations to share experiences and gain emotional support from each other.

Duration: 354.90 s

Answer the following question:

Q.Where can I share my story (about looking after someone with cognitive problems) and hear from others?
A.You could join a Facebook group called ‘Carers of People With Dementia’

Duration: 62.50 s

gkrishnan · July 30, 2023, 12:11am

Francisco,

Can you provide me where you found this code on HuggingFace? If you could let me know, I would appreciate it. I am trying to do Question Answering using LangChain via HuggingFace. Absolutely frustrated with the process. I am learning alot but I have no idea what’s going on.

Ganesh

Wi1 · August 19, 2023, 1:15am

@gkrishnan I’m late to the post but you can always manually pass in the model/pipeline:

from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
from langchain.llms import HuggingFacePipeline

model = AutoModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)
gen = pipeline('text-generation', model=model, tokenizer=tokenizer, max_new_tokens=200)

llama_llm = HuggingFacePipeline(pipeline=gen)

Topic		Replies	Views
Confused about max_length and max_new_tokens 🤗Transformers	5	19486	December 14, 2023
Pass tokenizer or model arguments Inference Endpoints on the Hub	0	789	October 17, 2022
Limit max # of tokens for inference in pipeline? Beginners	0	808	April 7, 2023
Adding too many tokens breaks tokenizer 🤗Tokenizers	0	129	March 12, 2024
[Tokenizers]What this max_length number? Beginners	2	1619	January 18, 2024

How do I increase max_new_tokens

Related Topics