When I try
model = AutoModel.from_pretrained("TheBloke/Llama-2-7B-Chat-GGML") I get
raise EnvironmentError( OSError: TheBloke/Llama-2-7B-Chat-GGML does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.
The transformers code I used was the code provided in the Transformer tab of this model. Other models seemed to lack safetensors (which is a warning which makes sense). Other models seemed to lack tokenizers.
So I clearly don’t understand:
- If the goal is to use the transformers Python library to run a HF model locally, how do i tell which of the models will work? Is there a filter, or is there a set of files I should look for? If the file isn’t there, can I build it?
- If a model is optimized via GPTQ or llama.cpp or…I don’t know EXLLAMA, does that mean it isn’t a transformer model? I.e.: Do transformer models do their own 4-bit quantizing, etc.?
- if a model is a GPTQ model (or similar quantized model) - can it be downloaded and used by the HF or langchain APIs?
Thank you very much.