What files are needed to use the HF Transformer pipeline()?

When I try model = AutoModel.from_pretrained("TheBloke/Llama-2-7B-Chat-GGML") I get

 raise EnvironmentError(
OSError: TheBloke/Llama-2-7B-Chat-GGML does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.

The transformers code I used was the code provided in the Transformer tab of this model. Other models seemed to lack safetensors (which is a warning which makes sense). Other models seemed to lack tokenizers.

So I clearly don’t understand:

  • If the goal is to use the transformers Python library to run a HF model locally, how do i tell which of the models will work? Is there a filter, or is there a set of files I should look for? If the file isn’t there, can I build it?
  • If a model is optimized via GPTQ or llama.cpp or…I don’t know EXLLAMA, does that mean it isn’t a transformer model? I.e.: Do transformer models do their own 4-bit quantizing, etc.?
  • if a model is a GPTQ model (or similar quantized model) - can it be downloaded and used by the HF or langchain APIs?

Thank you very much.