Does llama-2 need pro subscription?

sonali-tamhankar · July 28, 2023, 7:29pm

I get the following error when trying to use meta-llama/Llama-2-7b-hf model. I didn’t find any pointers through web search, so asking here. Can someone please help?

llm = HuggingFaceHub(repo_id = “meta-llama/Llama-2-7b-hf”,huggingfacehub_api_token=my_token, model_kwargs={“temperature”:0.5, “max_length”:512})

Generates the following error. If I swap out the repo_id to “google/flan-t5-base”, the code runs fine.

raise ValueError(f"Error raised by inference API: {response[‘error’]}")
ValueError: Error raised by inference API: Model requires a Pro subscription

Thank you for your help!

YaTharThShaRma999 · July 28, 2023, 9:54pm

yes to use it with inference api, you need pro subscription since its too large(13gbish>10gb which is free api limit). Ofcourse you could run it locally without any error.

sonali-tamhankar · July 28, 2023, 10:15pm

Thank you, @YaTharThShaRma999! Are there quantized versions I can use through inference API? I see models by ‘TheBloke’ which are smaller than 10GB, but it appears that inference API is turned off for these.

Thank you so much for your help!

YaTharThShaRma999 · July 28, 2023, 11:36pm

You cant use those since those are actually for different libraries and things. Like gptq version is for exllama and autogptq while ggml version are for llama cpp.

If you really want to use a llama model and dont have some sort of gpu, try out llama cpp python.
It uses like 4gb ram max for 7b model(avg phones have like 6gb ram).

If you do have gpu, use autogptq or something which works with transformers as well.

prakash1524 · August 2, 2023, 11:17am

@YaTharThShaRma999 hi sir can you please provide some resources for the mentioned
i wanna implement llama cpp python but using ctransformer library does not provide some functions like
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, AutoTokenizer

model_name = “TheBloke/Llama-2-7B-GGML”

bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type=“nf4”,
bnb_4bit_compute_dtype=torch.float16,
)

model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
trust_remote_code=True
)
model.config.use_cache = False

like this code i am not able to use in ggml
so i am not able to understand what model should we pick for finetuning ???
is ggml not for finetuning?
if not then whats the uses of ggml ?

can you please answer or give some links for these as i am not able to get

YaTharThShaRma999 · August 2, 2023, 9:05pm

Ggml is for inference but it is kinda possible to train a new model from scratch.

Also, in your code for some reason, you try to load a ggml model(already 4 bit quantized) in 4 bit again?

That is not possible. If you need a bit more information check out ctransformers docs? (Ctransformers is also just for inference and doesn’t have all the things like transformers)

Ggml models are used because they use extremely low ram up and very fast inference in cpu.

valogoblin · November 24, 2023, 8:25pm

hi did you got any solution, found any other model or found any way to use API for free?

Topic		Replies	Views
Model requires a Pro subscription Beginners	4	2649	August 14, 2024
I want to know if I can use llama 2 7b for my project with hugging face pro subscription 9 $ only? Beginners	0	500	December 13, 2023
LLAMA-2 Download issues Models	8	7867	November 7, 2023
Access of LLaMA-2-7b-chat-hf Model Models	0	188	July 11, 2024
HF hub requires pro subscription, but I aleady have it Models	1	589	June 5, 2024

Does llama-2 need pro subscription?

Related topics