Is this CUDA memory error on Inference API coming from HuggingFace or Google Collab?

aethel · July 20, 2021, 4:14pm

Hello,

I am calling the HF Inference API using the code from this article:

When I use the widget HF created there at the top of that page to enter a long prompt (about 1500 tokens) it works fine.

However, when I use the code at the bottom of the article in Google Collab with the same inputs: (I had to add the definition of the options variable. It was erroring without it.)

import json
import requests

API_TOKEN = ""

def query(payload='',parameters=None,options={'use_cache': False}):
    API_URL = "https://api-inference.huggingface.co/models/EleutherAI/gpt-neo-2.7B"
        headers = {"Authorization": f"Bearer {API_TOKEN}"}
    body = {"inputs":payload,'parameters':parameters,'options':options}
    response = requests.request("POST", API_URL, headers=headers, data= json.dumps(body))
    try:
      response.raise_for_status()
    except requests.exceptions.HTTPError:
        return "Error:"+" ".join(response.json()['error'])
    else:
      return response.json()[0]['generated_text']

parameters = {

    'max_new_tokens':150,  # number of generated tokens

    'temperature': .3,   # controlling the randomness of generations

    'end_sequence': "###" # stopping sequence for generation

}

options={'use_cache': False}

prompt="MY BIG LONG PROMPT"             # few-shot prompt

data = query(prompt,parameters,options)

I get this error: “CUDA out of memory, try a smaller payload”

I’m not exactly sure which layers of the stack this error could be coming from. Does the error have to be coming from the HF API, or could it be coming from the Google part of the stack? I don’t want to subscribe to Colab Pro only to find out that was not the problem.

Thank you!

Topic		Replies	Views
Cuda out of memory error when using Inference API 🤗Hub	0	946	August 11, 2022
CUDA error for inference on GPU instance Amazon SageMaker	2	761	May 16, 2023
Too many error when i prompt Beginners	3	65	September 15, 2024
'CUDA error: all CUDA-capable devices are busy or unavailable" when using 🤗Accelerate	0	1984	March 14, 2022
Trouble Invoking GPU-Accelerated Inference Beginners	5	1459	October 3, 2022

Is this CUDA memory error on Inference API coming from HuggingFace or Google Collab?

Related topics