I have a 64gb m1 max. If I try to us mpt-7b from python:
model = transformers.AutoModelForCausalLM.from_pretrained(
'mosaicml/mpt-7b-chat',
trust_remote_code=True
)
I get an error with flash_attn - and there doesn’t seem to be any way to install flash_attn on a mac - it seems devoted to cuda.
HOWEVER
If I run gpt4all - I can use that model - so it is definitely possible to use from my machine.
What am I doing wrong?