Running mpt-7b on Mac m1

I have a 64gb m1 max. If I try to us mpt-7b from python:

model = transformers.AutoModelForCausalLM.from_pretrained(

I get an error with flash_attn - and there doesn’t seem to be any way to install flash_attn on a mac - it seems devoted to cuda.
If I run gpt4all - I can use that model - so it is definitely possible to use from my machine.
What am I doing wrong?

Hey @darrenoakey,

You may want to take a look at this answer of mine to see how to load the model (fully or partially) on CPU: How to use trust_remote_code=True with load_checkpoint_and_dispatch? - #2 by abhinavkulkarni