Error running model in zerogpu

KwabsHug · October 2, 2024, 5:25pm

I am trying to use ISTA-DASLab/Meta-Llama-3.1-70B-AQLM-PV-2Bit-1x16 in the zero space KwabsHug/TestCompressedModelzero. When you press generate it starts but always returns

OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

Does anyone know how resolve this?

John6666 · October 2, 2024, 11:44pm

It is possible that this is due to a bug or awkward specification in the Zero GPU space.

It seems that aqlm is supposed to reference CUDA_HOME, but there is no CUDA Toolkit installed in the Zero GPU space to begin with.
I’ll try to fix it, but I’m not sure if it will work or not. loading the model into CUDA and packing the tensor works fine, so I have a feeling it will work if the libraries can handle it.

John6666 · October 2, 2024, 11:48pm

Okay, it works!
But no matter how you look at it, I just forced it to work.

import spaces
import gradio as gr
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
import subprocess
import os

def install_cuda_toolkit():
    # CUDA_TOOLKIT_URL = "https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run"
    CUDA_TOOLKIT_URL = "https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda_12.2.0_535.54.03_linux.run"
    CUDA_TOOLKIT_FILE = "/tmp/%s" % os.path.basename(CUDA_TOOLKIT_URL)
    subprocess.call(["wget", "-q", CUDA_TOOLKIT_URL, "-O", CUDA_TOOLKIT_FILE])
    subprocess.call(["chmod", "+x", CUDA_TOOLKIT_FILE])
    subprocess.call([CUDA_TOOLKIT_FILE, "--silent", "--toolkit"])

    os.environ["CUDA_HOME"] = "/usr/local/cuda"
    os.environ["PATH"] = "%s/bin:%s" % (os.environ["CUDA_HOME"], os.environ["PATH"])
    os.environ["LD_LIBRARY_PATH"] = "%s/lib:%s" % (
        os.environ["CUDA_HOME"],
        "" if "LD_LIBRARY_PATH" not in os.environ else os.environ["LD_LIBRARY_PATH"],
    )
    # Fix: arch_list[-1] += '+PTX'; IndexError: list index out of range
    os.environ["TORCH_CUDA_ARCH_LIST"] = "8.0;8.6"

install_cuda_toolkit()

device = "cuda" if torch.cuda.is_available() else "cpu"

tokenizer = AutoTokenizer.from_pretrained("ISTA-DASLab/Meta-Llama-3.1-70B-AQLM-PV-2Bit-1x16")
model = AutoModelForCausalLM.from_pretrained("ISTA-DASLab/Meta-Llama-3.1-70B-AQLM-PV-2Bit-1x16", torch_dtype='auto', device_map='auto').to(device)

@spaces.GPU
def generate_text(prompt):
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(inputs.input_ids, max_length=100)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

interface = gr.Interface(
    fn=generate_text,
    inputs="text",
    outputs="text",
    title="Meta-Llama-3.1-70B Text Generation",
    description="Enter a prompt and generate text using Meta-Llama-3.1-70B.",
)

interface.launch()

KwabsHug · October 3, 2024, 12:14am

Thanks for the reply. Problem solved, much appreciated.

John6666 · October 3, 2024, 12:27am

Well, I don’t think the fundamental problem has been solved, but I guess it’s better to have it work than not to have it work!
It’s a bug, this…

KwabsHug · October 3, 2024, 12:44am

Oh yeah, I forgot about that part. Thanks for letting me see that the quantised model loaded on zerogpu and my condolences for the bug.

system · October 3, 2024, 12:45pm

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
CUDA_HOME on zerogpu Beginners	3	1075	November 13, 2024
Runtime errors when starting a local space of DreamGaussian Spaces	0	406	June 11, 2024
Ninja Setup help Spaces	0	1129	May 24, 2022
ZeroGPU has not been initialized when trying to use space in other space Spaces	1	301	August 30, 2024
Does ZeroGPU not work for all spaces? Beginners	5	1547	February 28, 2025

Error running model in zerogpu

Related topics