New to HF, cannot get any model to work in jupyter lab

I use an m1 mac (macos 13.7.8), a normal virtual environment (as is customary in Python), and jupyter lab.

I generated multiple access tokes during my first trials, I now saved the latest one I created.

Procedure I used (squared brackets are used to designate the beginning and end of code or output):

  • logged in with hf auth login in the terminal (using my access token)

  • opened jupyter lab from terminal.

  • nothing worked so I logged in a second time from the jupyter lab notebook using (from huggingface_hub import notebook_login

    notebook_login()) and the same access token

  • still nothing works, e.g. executing the code from the quickstart tutorial: [from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained(“meta-llama/Llama-2-7b-hf”, dtype=“auto”, device_map=“auto”) tokenizer = AutoTokenizer.from_pretrained(“meta-llama/Llama-2-7b-hf”)] ; Unfortunately, I cannot share the error message as apparently new users can only put two links in a post.

    Maybe a free account is not sufficient to have access but in this case I would find it weird that this code is then used for the quick start tutorial. Very weird.

I also tried to implement a model using code provided by Huggingface: executed code: [test.py

import torch
from PIL import Image
from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained(‘openbmb/MiniCPM-V’, trust_remote_code=True, torch_dtype=torch.bfloat1

model = model.to(device=‘mps’, dtype=torch.float16)

tokenizer = AutoTokenizer.from_pretrained(‘openbmb/MiniCPM-V’, trust_remote_code=True)
model.eval()

image = Image.open(image_file_path).convert(‘RGB’)
question = ‘What is in the image?’
msgs = [{‘role’: ‘user’, ‘content’: question}]

res, context, _ = model.chat(
image=image,
msgs=msgs,
context=None,
tokenizer=tokenizer,
sampling=True,
temperature=0.7
)
print(res) ]

In short I tried several models (with code provided by Huggingface and nothing works (several different kinds of error messages). On the other hand, using a workaround by qnguyen3 (solving the flash_attn problem for m1 macs) immediately worked. So, I am confused: If qnguyen3 can provide a working solution why can Huggingface not provide examples that are easy to get to work? It seems to me that especially at the beginning of a tutorial that would make sense. Or is this some kind of filter that should make clear that only people with a background in computer science (as opposed to data scientists) should be using this platform? Or is it some kind of compatibility problem? Can someone help or give a hint?

1 Like

I think you’re encountering multiple errors simultaneously. Personally, I recommend using MLX or Ollama. They make it easier to effectively utilize MPS.

Have you tried using Ollama or Llama.cpp?

To run any model you need to get: pip install transformers pytorch accelerate on your venv. Them call the script.

I got a lot of struggles at the beginning too.

1 Like

Yeah. Transformers are standard and relatively versatile for experiments and modifications. While inference is straightforward if you’re only using the pipeline, setup makes it more suitable for intermediate users and above…

I think it’s best to start by trying Ollama for CLI or LM Studio for GUI. Once you can use one, there’s no fundamental difference beyond speed or usage. Getting started is the first hurdle.

1 Like
ChatGPT said:

You’re running into a combo of two things:

  1. Access / account permissions on Hugging Face

    • Some models (like Llama-2) require you to:

      • Sign the license on Hugging Face’s model page (Meta requires approval).

      • Be logged in with a valid token tied to that license acceptance.

    • If you skip that, the code fails even if your token is correct.

    • A free account is fine, but you must accept the license before download.

  2. Compatibility issues on Apple Silicon (M1, macOS)

    • A lot of Hugging Face examples assume Linux + CUDA (NVIDIA).

    • On M1/M2 Macs, you only have CPU or MPS (Metal Performance Shaders).

    • Many models (like Llama-2, MiniCPM-V) try to use CUDA by default → errors.

    • That’s why qnguyen3’s workaround (patching FlashAttention + MPS) worked — Hugging Face hasn’t fully baked Apple-friendly defaults yet.


:white_check_mark: How to Fix It (Step-by-Step)

  1. Make sure your token is active

    huggingface-cli whoami
    
    

    If it shows your username, you’re good. If not, run:

    huggingface-cli login
    
    

    and paste your token.

  2. Accept the license on model page

    • Go to meta-llama/Llama-2-7b-hf.

    • Click “Agree and access model”.

    • Try again with your token.

  3. Force MPS backend on M1
    In your notebook:

    import torch
    
    device = "mps" if torch.backends.mps.is_available() else "cpu"
    
    model = AutoModelForCausalLM.from_pretrained(
        "meta-llama/Llama-2-7b-hf",
        torch_dtype=torch.float16,
        device_map={"": device},
    )
    
    
  4. Install MPS-friendly PyTorch
    Make sure you’re on a version that supports Metal:

    pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
    
    
  5. Stick to smaller models first
    Large ones (7B, 13B) often OOM on M1. Start with:

    • openlm-research/open_llama_3b

    • tiiuae/falcon-rw-1b


:high_voltage: Why Hugging Face tutorials feel “broken”

  • They’re written for Linux + NVIDIA GPUs.

  • Apple Silicon support is still patchy, especially for models needing FlashAttention, bfloat16, or CUDA kernels.

  • That’s why community fixes (like qnguyen3’s) often work better.