Falcon-7b-instruct ALWAYS returns SHORT ANSWERS on inference endpoint

AndresReibel · July 4, 2023, 7:46am

Hey,

I put the falcon-7b-instruct on huggingface inference endpoint.

The issue I have is that the output it gives me is very short. Even when prompted: ‘Write a 300 word poem’ or given a question plus document prompt, it returns answers (60 to 100 tokens long ~ 10-15 words).

Is there something fundamentally wrong?

Many thanks,
A.

P.S.

I changed nothing in the tiiuae/falcon-7b-instruct repo apart from the below:

handler.py

import torch

from typing import Any, Dict
from transformers import AutoModelForCausalLM, AutoTokenizer

class EndpointHandler:

def __init__(self, path=""):
    self.tokenizer = AutoTokenizer.from_pretrained(path)
    self.model = AutoModelForCausalLM.from_pretrained(
        path, device_map="auto", torch_dtype=torch.float16, trust_remote_code=True
    )
    self.device = "cuda" if torch.cuda.is_available() else "cpu"
    self.max_length = 4096
    self.max_new_tokens = 4096
    self.top_k = 100
    self.top_p = 0.95
    self.temperature = 0.9

def __call__(self, data: Dict[str, Any]) -> Dict[str, str]:
    inputs = data.pop("inputs", data)
    parameters = data.pop("parameters", {})

    inputs = self.tokenizer(inputs, return_tensors="pt").to(self.device)

    parameters["max_length"] = self.max_length
    parameters["max_new_tokens"] = self.max_new_tokens
    parameters["top_k"] = self.top_k
    parameters["top_p"] = self.top_p
    parameters["temperature"] = self.temperature
    outputs = self.model.generate(**inputs, **parameters)

    prediction = self.tokenizer.decode(
        outputs[0], skip_special_tokens=True)

    return [{"generated_text": prediction}]

Roshani-D · September 5, 2023, 8:17am

facing same issue with falcon

sometimes it returns one liners and when it does not know the answer, it generates some random question-answer pairs as response

Topic		Replies	Views
Increase Output length of Falcon 7b instruct Beginners	2	734	July 4, 2023
Truncated output on mistralai/Mistral-7B-Instruct-v0.1 Inference Endpoints on the Hub	4	1745	December 21, 2023
Help for inference.py code Amazon SageMaker	10	3993	March 8, 2022
Simple example run takes 5+ minutes on rtx3060 - falcon7B Beginners	1	494	February 18, 2024
Inference in Hub displaying empty String (Task Translation) 🤗Hub	0	511	September 25, 2023

Falcon-7b-instruct ALWAYS returns SHORT ANSWERS on inference endpoint

Related topics