Hello, I’m new to hugging face and yesterday I tried to just start a LLM locally. I chose this model mistralai/Mistral-Small-24B-Instruct-2501. Copied the code from the snippet to use it :
# Use a pipeline as a high-level helper
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
BETTER_BNP_TOKEN ='MY_ACCESS_TOKEN' # read/write permissions
hugging_face_model = 'mistralai/Mistral-Small-24B-Instruct-2501'
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe = pipeline(
"text-generation",
model=hugging_face_model,
token=BETTER_BNP_TOKEN,
max_length=200
)
pipe(messages)
I try to start and I get this error
File "/usr/lib/python3.10/genericpath.py", line 30, in isfile
st = os.stat(path)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType
I understand what the error means but I am not sure how/where to proceed next?
Thanks in advance for any help 
1 Like
It seems that this error occurs when the tokenizer.model file is missing. Also, if you have an old version of Transformers. In this case, you may need the github version, as it is a newer model.
pip install git+https://github.com/huggingface/transformers
Thank you for your reply, what is the tokenizer.model
file? Am I supposed to download this before starting my script?
Also, I am using the latest transformers version.
1 Like
I think it’s probably a file related to the tokenizer, but I don’t know much about it either.
For now, try the code below. You can isolate the problem by seeing if it works or not.
from transformers import pipeline
BETTER_BNP_TOKEN ='MY_ACCESS_TOKEN' # read permissions
#hugging_face_model = 'mistralai/Mistral-Small-24B-Instruct-2501'
hugging_face_model = 'HuggingFaceTB/SmolLM2-135M-Instruct'
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe = pipeline("text-generation", model=hugging_face_model, token=BETTER_BNP_TOKEN, max_length=200)
print(pipe(messages))
Your code didn’t work it was giving me 401. I went on this model page (HuggingFaceTB/SmolLM2-135M-Instruct · Hugging Face) copied the code there and it works.
Code I executed:
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="hf-inference",
api_key="api_key"
)
messages = [
{
"role": "user",
"content": "What is the capital of France?"
}
]
stream = client.chat.completions.create(
model="HuggingFaceTB/SmolLM2-135M-Instruct",
messages=messages,
max_tokens=500,
stream=True
)
for chunk in stream:
print(chunk.choices[0].delta.content, end="")
1 Like
OK. How about this?
# your code
#model="HuggingFaceTB/SmolLM2-135M-Instruct",
model="mistralai/Mistral-Small-24B-Instruct-2501",
Ok I have no idea why but after adding import torch
it’s now working. Unfortunately it crashed because my GPU does not have enough memory but at least the error above was fixed.
Thank you for your help 
1 Like
it crashed because my GPU does not have enough memory
Try the accelerate library or Ollama.
pipe = pipeline(
"text-generation",
model=hugging_face_model,
token=BETTER_BNP_TOKEN,
max_length=200,
device_map="auto", # offload model from VRAM to normal RAM if necessary
)
1 Like
Oki I will take a loot at that thank you
1 Like