Phi-3-mini-128k-instruct not working with pro inference api

sam-paech · April 26, 2024, 5:58pm

Using this endpoint:

https://api-inference.huggingface.co/models/microsoft/Phi-3-mini-128k-instruct/v1/

I get this error:

Error code: 500 - {‘error’: ‘The repository for microsoft/Phi-3-mini-128k-instruct contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/microsoft/Phi-3-mini-128k-instruct.\nPlease pass the argument trust_remote_code=True to allow custom code to be run.’}

client.list_deployed_models() shows that the model is deployed. The 4k version works fine.

Thanks!

radames · April 26, 2024, 6:03pm

hi @sam-paech ,
Looking at the available models on TGI, currently, only microsoft/phi-3-mini-4k-instruct is available. I can check if we have plans to support the other models.

sam-paech · April 26, 2024, 6:11pm

Running this code:

from huggingface_hub import InferenceClient
client = InferenceClient(token=MYTOKEN)
client.list_deployed_models()

Returns this list for deployed text-generation models, which includes the 128k model:

‘text-generation’: [‘b3ck1/gpt-neo-125M-finetuned-beer-recipes’,
‘bigcode/octocoder’,
‘bigcode/santacoder’,
‘bigcode/starcoder’,
‘bigcode/starcoder2-15b’,
‘bigcode/starcoder2-3b’,
‘bigcode/starcoderplus’,
‘bigscience/bloom’,
‘bigscience/bloom-560m’,
‘blockblockblock/smol_llama-220M-GQA-bpw2.5’,
‘codellama/CodeLlama-13b-hf’,
‘codellama/CodeLlama-34b-Instruct-hf’,
‘codellama/CodeLlama-7b-hf’,
‘CohereForAI/c4ai-command-r-plus’,
‘dh-unibe/gpt2-larger-walser’,
‘dh-unibe/luther-xl’,
‘EleutherAI/pythia-14m’,
‘flax-community/gpt-neo-125M-apps’,
‘google/flan-t5-xxl’,
‘google/gemma-1.1-2b-it’,
‘google/gemma-1.1-7b-it’,
‘google/gemma-2b’,
‘google/gemma-2b-it’,
‘google/gemma-7b’,
‘google/gemma-7b-it’,
‘gpt2-large’,
‘Gustavosta/MagicPrompt-Stable-Diffusion’,
‘h2oai/h2o-danube2-1.8b-chat’,
‘hsramall/hsramall-70b-chat-placeholder’,
‘HuggingFaceFW/ablation-model-fineweb-v1’,
‘HuggingFaceH4/starchat-beta’,
‘HuggingFaceH4/starchat2-15b-v0.1’,
‘HuggingFaceH4/tiny-random-LlamaForCausalLM’,
‘HuggingFaceH4/zephyr-7b-alpha’,
‘HuggingFaceH4/zephyr-7b-beta’,
‘HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1’,
‘HuggingFaceM4/idefics-80b-instruct’,
‘HuggingFaceM4/idefics-9b-instruct’,
‘huggingtweets/angelinacho-stillconor-touchofray’,
‘huggingtweets/itspublu’,
‘huggingtweets/j_j_j_j_j_jones’,
‘huggingtweets/jackieracc_’,
‘huggingtweets/jaguarunlocked’,
‘huggingtweets/lulaoficial-ptbrasil’,
‘IlyaGusev/saiga_7b_lora’,
‘ismaelfaro/gpt2-poems.en’,
‘ismaelfaro/gpt2-poems.es’,
‘jppaolim/homerGPT2’,
‘kashif/stack-llama-2’,
‘MBZUAI/LaMini-Neo-1.3B’,
‘MBZUAI/LLaVA-Phi-3-mini-4k-instruct’,
‘meta-llama/Llama-2-13b-chat-hf’,
‘meta-llama/Llama-2-13b-hf’,
‘meta-llama/Llama-2-70b-chat-hf’,
‘meta-llama/Llama-2-7b-chat-hf’,
‘meta-llama/Llama-2-7b-hf’,
‘meta-llama/Meta-Llama-3-70B-Instruct’,
‘meta-llama/Meta-Llama-3-8B-Instruct’,
‘microsoft/biogpt’,
‘microsoft/BioGPT-Large’,
‘microsoft/BioGPT-Large-PubMedQA’,
‘microsoft/DialoGPT-large’,
‘microsoft/DialoGPT-medium’,
‘microsoft/phi-1_5’,
‘microsoft/Phi-3-mini-128k-instruct’,
‘microsoft/Phi-3-mini-4k-instruct’,
‘mistralai/Mistral-7B-Instruct-v0.1’,
‘mistralai/Mistral-7B-Instruct-v0.2’,
‘mistralai/Mistral-7B-v0.1’,
‘mistralai/Mixtral-8x7B-Instruct-v0.1’,
‘model-attribution-challenge/bloom-350m’,
‘model-attribution-challenge/codegen-350M-multi’,
‘model-attribution-challenge/distilgpt2’,
‘model-attribution-challenge/gpt2’,
‘model-attribution-challenge/gpt2-xl’,
‘model-attribution-challenge/xlnet-base-cased’,
‘mywateriswet/ShuanBot’,
‘NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO’,
‘openai-community/gpt2-medium’,
‘openai-gpt’,
‘OpenAssistant/oasst-sft-1-pythia-12b’,
‘OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5’,
‘Pi3141/DialoGPT-medium-elon-2’,
‘rachiteagles/cover_letter’,
‘stabilityai/stablelm-2-1_6b’,
‘succinctly/text2image-prompt-generator’,
‘tiiuae/falcon-7b’,
‘timdettmers/guanaco-33b-merged’,
‘TinyLlama/TinyLlama-1.1B-Chat-v0.6’,
‘togethercomputer/RedPajama-INCITE-Chat-3B-v1’,
‘xlnet-base-cased’,
‘zrowt/test-deere’],

radames · April 26, 2024, 6:24pm

yes, it’s available with transformers, however, still incomplete the inference doesn’t support custom remote code.

And here is the list with all models that are “warm”
https://api-inference.huggingface.co/framework/text-generation-inference

If you need to deploy Phi-3-mini-128k-instruct as inference endpoint you’ll need a custom handler to support trust_remote_code=True

sam-paech · April 26, 2024, 6:42pm

This list doesn’t contain some models that are listed as deployed & are working with the api, e.g. command-r-plus.

I only just signed up for the pro subscription so I’m not sure how it’s supposed to work. But I would have assumed the list returned by

client.list_deployed_models()

would represent the models that are available for inference with the API. If that isn’t the case, is there a way to get an authoritative list of deployed + working models?

sam-paech · April 26, 2024, 6:55pm

Oops I was wrong, your list does contain command-r-plus.

nikos-ir · June 10, 2024, 1:13pm

Thanks for the info! I was also trying to deploy phi3 models on a dedicated endpoint, and the custom handler seems to be the only current solution.

Is there a similar list for which models are currently supported on dedicated inference endpoints (without requiring a custom handler)?
Trying to deploy microsoft/phi-3-mini-4k-instruct on one gets me a similar error about trust_remote_code in the logs.

Also, according to the model page,

Phi-3 has been integrated in the development version (4.40.0.dev) of transformers .

Is it then a reasonable assumption that once dedicated inference endpoints use a transformers version > 4.40.0, then microsoft/phi-3-mini-4k-instruct will be deployable on dedicated inference endpoints through TGI? Is it possible to see which version is currently used?

nielsr · June 10, 2024, 1:48pm

Hi,

Yes, the list of default libraries can be found here: https://huggingface.co/docs/inference-endpoints/others/runtime (includes everything except TGI version, team is going to fix that).

Phi-3 has indeed been integrated natively in the Transformers library: transformers/src/transformers/models/phi3/modeling_phi3.py at main · huggingface/transformers · GitHub, which means that you can now load it without having to specify trust_remote_code=True.

ig0r · June 10, 2024, 9:58pm

Hi, @nielsr,

I’m a bit confused by your answer. I tried to create a dedicated inference point with microsoft/Phi-3-mini-4k-instruct, but it failed with an error saying I need to specify trust_remote_code=True.

nielsr · June 11, 2024, 6:57am

Hi,

That’s probably because the current Transformers version of Inference Endpoints is 4.38.2 as per the doc here: https://huggingface.co/docs/inference-endpoints/others/runtime. Hence it will only be possible once this updates to Transformers v4.40.

dordonezc · June 15, 2024, 6:58pm

Hey, can you share the code you used to create the custom handler? I haven’t been able to make it work

nikos-ir · June 19, 2024, 3:13pm

Essentially this, just with these params on the model initialization (deployed on a T4 GPU):

model = AutoModelForCausalLM.from_pretrained(
            path, 
            torch_dtype=torch.bfloat16,
            device_map="cuda",
            trust_remote_code=True
        )

jpacifico · August 17, 2024, 6:00am

hi, any update about this issue ? same error while creating an endpoint from a phi-3 based model

segadog · August 20, 2024, 10:35am

plus 1. Got the same error

ValueError: The repository for /repository contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co//repository.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.

nielsr · August 26, 2024, 1:55pm

This will be resolved once Inference Endpoints updates its Transformers version, will ping the team.

Topic		Replies	Views
How to configure a model for Inference API? Models	0	384	May 23, 2024
Gateway Problem Beginners	2	55	January 7, 2025
Anyone else VERY confused? Community Calls	1	1229	December 19, 2023
Unable to access model with Inference Client Beginners	2	142	May 6, 2025
Inference API stopped working Inference Endpoints on the Hub	50	3481	June 8, 2025

Phi-3-mini-128k-instruct not working with pro inference api

Related topics