Phi-3-mini-128k-instruct not working with pro inference api

Using this endpoint:

https://api-inference.huggingface.co/models/microsoft/Phi-3-mini-128k-instruct/v1/

I get this error:

Error code: 500 - {‘error’: ‘The repository for microsoft/Phi-3-mini-128k-instruct contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/microsoft/Phi-3-mini-128k-instruct.\nPlease pass the argument trust_remote_code=True to allow custom code to be run.’}

client.list_deployed_models() shows that the model is deployed. The 4k version works fine.

Thanks!

hi @sam-paech ,
Looking at the available models on TGI, currently, only microsoft/phi-3-mini-4k-instruct is available. I can check if we have plans to support the other models.

Running this code:

from huggingface_hub import InferenceClient
client = InferenceClient(token=MYTOKEN)
client.list_deployed_models()

Returns this list for deployed text-generation models, which includes the 128k model:

‘text-generation’: [‘b3ck1/gpt-neo-125M-finetuned-beer-recipes’,
‘bigcode/octocoder’,
‘bigcode/santacoder’,
‘bigcode/starcoder’,
‘bigcode/starcoder2-15b’,
‘bigcode/starcoder2-3b’,
‘bigcode/starcoderplus’,
‘bigscience/bloom’,
‘bigscience/bloom-560m’,
‘blockblockblock/smol_llama-220M-GQA-bpw2.5’,
‘codellama/CodeLlama-13b-hf’,
‘codellama/CodeLlama-34b-Instruct-hf’,
‘codellama/CodeLlama-7b-hf’,
‘CohereForAI/c4ai-command-r-plus’,
‘dh-unibe/gpt2-larger-walser’,
‘dh-unibe/luther-xl’,
‘EleutherAI/pythia-14m’,
‘flax-community/gpt-neo-125M-apps’,
‘google/flan-t5-xxl’,
‘google/gemma-1.1-2b-it’,
‘google/gemma-1.1-7b-it’,
‘google/gemma-2b’,
‘google/gemma-2b-it’,
‘google/gemma-7b’,
‘google/gemma-7b-it’,
‘gpt2-large’,
‘Gustavosta/MagicPrompt-Stable-Diffusion’,
‘h2oai/h2o-danube2-1.8b-chat’,
‘hsramall/hsramall-70b-chat-placeholder’,
‘HuggingFaceFW/ablation-model-fineweb-v1’,
‘HuggingFaceH4/starchat-beta’,
‘HuggingFaceH4/starchat2-15b-v0.1’,
‘HuggingFaceH4/tiny-random-LlamaForCausalLM’,
‘HuggingFaceH4/zephyr-7b-alpha’,
‘HuggingFaceH4/zephyr-7b-beta’,
‘HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1’,
‘HuggingFaceM4/idefics-80b-instruct’,
‘HuggingFaceM4/idefics-9b-instruct’,
‘huggingtweets/angelinacho-stillconor-touchofray’,
‘huggingtweets/itspublu’,
‘huggingtweets/j_j_j_j_j_jones’,
‘huggingtweets/jackieracc_’,
‘huggingtweets/jaguarunlocked’,
‘huggingtweets/lulaoficial-ptbrasil’,
‘IlyaGusev/saiga_7b_lora’,
‘ismaelfaro/gpt2-poems.en’,
‘ismaelfaro/gpt2-poems.es’,
‘jppaolim/homerGPT2’,
‘kashif/stack-llama-2’,
‘MBZUAI/LaMini-Neo-1.3B’,
‘MBZUAI/LLaVA-Phi-3-mini-4k-instruct’,
‘meta-llama/Llama-2-13b-chat-hf’,
‘meta-llama/Llama-2-13b-hf’,
‘meta-llama/Llama-2-70b-chat-hf’,
‘meta-llama/Llama-2-7b-chat-hf’,
‘meta-llama/Llama-2-7b-hf’,
‘meta-llama/Meta-Llama-3-70B-Instruct’,
‘meta-llama/Meta-Llama-3-8B-Instruct’,
‘microsoft/biogpt’,
‘microsoft/BioGPT-Large’,
‘microsoft/BioGPT-Large-PubMedQA’,
‘microsoft/DialoGPT-large’,
‘microsoft/DialoGPT-medium’,
‘microsoft/phi-1_5’,
‘microsoft/Phi-3-mini-128k-instruct’,
‘microsoft/Phi-3-mini-4k-instruct’,
‘mistralai/Mistral-7B-Instruct-v0.1’,
‘mistralai/Mistral-7B-Instruct-v0.2’,
‘mistralai/Mistral-7B-v0.1’,
‘mistralai/Mixtral-8x7B-Instruct-v0.1’,
‘model-attribution-challenge/bloom-350m’,
‘model-attribution-challenge/codegen-350M-multi’,
‘model-attribution-challenge/distilgpt2’,
‘model-attribution-challenge/gpt2’,
‘model-attribution-challenge/gpt2-xl’,
‘model-attribution-challenge/xlnet-base-cased’,
‘mywateriswet/ShuanBot’,
‘NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO’,
‘openai-community/gpt2-medium’,
‘openai-gpt’,
‘OpenAssistant/oasst-sft-1-pythia-12b’,
‘OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5’,
‘Pi3141/DialoGPT-medium-elon-2’,
‘rachiteagles/cover_letter’,
‘stabilityai/stablelm-2-1_6b’,
‘succinctly/text2image-prompt-generator’,
‘tiiuae/falcon-7b’,
‘timdettmers/guanaco-33b-merged’,
‘TinyLlama/TinyLlama-1.1B-Chat-v0.6’,
‘togethercomputer/RedPajama-INCITE-Chat-3B-v1’,
‘xlnet-base-cased’,
‘zrowt/test-deere’],

yes, it’s available with transformers, however, still incomplete the inference doesn’t support custom remote code.

And here is the list with all models that are “warm”
https://api-inference.huggingface.co/framework/text-generation-inference

If you need to deploy Phi-3-mini-128k-instruct as inference endpoint you’ll need a custom handler to support trust_remote_code=True

This list doesn’t contain some models that are listed as deployed & are working with the api, e.g. command-r-plus.

I only just signed up for the pro subscription so I’m not sure how it’s supposed to work. But I would have assumed the list returned by

client.list_deployed_models()

would represent the models that are available for inference with the API. If that isn’t the case, is there a way to get an authoritative list of deployed + working models?

Oops I was wrong, your list does contain command-r-plus.

Thanks for the info! I was also trying to deploy phi3 models on a dedicated endpoint, and the custom handler seems to be the only current solution.

Is there a similar list for which models are currently supported on dedicated inference endpoints (without requiring a custom handler)?
Trying to deploy microsoft/phi-3-mini-4k-instruct on one gets me a similar error about trust_remote_code in the logs.

Also, according to the model page,

Phi-3 has been integrated in the development version (4.40.0.dev) of transformers .

Is it then a reasonable assumption that once dedicated inference endpoints use a transformers version > 4.40.0, then microsoft/phi-3-mini-4k-instruct will be deployable on dedicated inference endpoints through TGI? Is it possible to see which version is currently used?

Hi,

Yes, the list of default libraries can be found here: https://huggingface.co/docs/inference-endpoints/others/runtime (includes everything except TGI version, team is going to fix that).

Phi-3 has indeed been integrated natively in the Transformers library: transformers/src/transformers/models/phi3/modeling_phi3.py at main · huggingface/transformers · GitHub, which means that you can now load it without having to specify trust_remote_code=True.

2 Likes

Hi, @nielsr,

I’m a bit confused by your answer. I tried to create a dedicated inference point with microsoft/Phi-3-mini-4k-instruct, but it failed with an error saying I need to specify trust_remote_code=True.

Hi,

That’s probably because the current Transformers version of Inference Endpoints is 4.38.2 as per the doc here: https://huggingface.co/docs/inference-endpoints/others/runtime. Hence it will only be possible once this updates to Transformers v4.40.

1 Like

Hey, can you share the code you used to create the custom handler? I haven’t been able to make it work

Essentially this, just with these params on the model initialization (deployed on a T4 GPU):

model = AutoModelForCausalLM.from_pretrained(
            path, 
            torch_dtype=torch.bfloat16,
            device_map="cuda",
            trust_remote_code=True
        )