Hi!
I am trying to quantize Llama-2, and using this:
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:512"
model_id = "meta-llama/Llama-2-70b-hf"
dataset = "wikitext2"
bits = 4
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=False, cache_dir="./models")
gptq_config = GPTQConfig(bits=4, dataset=dataset, tokenizer=tokenizer)
But I want to use c4 instead of wikitext2. With c4 and c4-new I is saying something like:
Repo card metadata block was not found. Setting CardData to empty.
And then during quantization throwing error. But everything works perfectly if I use “wikitext2”. Is there a reason why I can not use c4?
Thanks and regards,
Mahi
Solved: Turns out installing optimum and transformers from the git repo and upgrading accelerate solves it!