i fix my same problem with following, not sure which one make it.
1.i remove model.sagetensors.index.json ,model-00001-of-00002.safetensors, model-00002-of-00002.safetensors files
2. your model path name must be the same with metaâs
model = â*****/Llama-2-7b-chat-hfâ
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
âtext-generationâ,
model=model,
tokenizer=tokenizer,
torch_dtype=torch.float16,
device_map=âautoâ,
)
in config.json,i rewrite it with:
{
â_name_or_pathâ: âLlama-2-7b-chat-hfâ,
âarchitecturesâ: [
âLlamaForCausalLMâ
],
âbos_token_idâ: 1,
âeos_token_idâ: 2,
âhidden_actâ: âsiluâ,
âhidden_sizeâ: 4096,
âinitializer_rangeâ: 0.02,
âintermediate_sizeâ: 11008,
âmax_position_embeddingsâ: 4096,
âmodel_typeâ: âllamaâ,
ânum_attention_headsâ: 32,
ânum_hidden_layersâ: 32,
ânum_key_value_headsâ: 32,
âpretraining_tpâ: 1,
ârms_norm_epsâ: 1e-05,
ârope_scalingâ: null,
âtie_word_embeddingsâ: false,
âtorch_dtypeâ: âfloat16â,
âtransformers_versionâ: â4.31.0â,
âuse_cacheâ: true,
âvocab_sizeâ: 32000
}
also,in command line using this to run âCUDA_VISIBLE_DEVICES=1 python inference.pyâ if you have many gpus.