Phi-3-mini-128k-instruct not working with pro inference api

Using this endpoint:

https://api-inference.huggingface.co/models/microsoft/Phi-3-mini-128k-instruct/v1/

I get this error:

Error code: 500 - {‘error’: ‘The repository for microsoft/Phi-3-mini-128k-instruct contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/microsoft/Phi-3-mini-128k-instruct.\nPlease pass the argument trust_remote_code=True to allow custom code to be run.’}

client.list_deployed_models() shows that the model is deployed. The 4k version works fine.

Thanks!

hi @sam-paech ,
Looking at the available models on TGI, currently, only microsoft/phi-3-mini-4k-instruct is available. I can check if we have plans to support the other models.

Running this code:

from huggingface_hub import InferenceClient
client = InferenceClient(token=MYTOKEN)
client.list_deployed_models()

Returns this list for deployed text-generation models, which includes the 128k model:

‘text-generation’: [‘b3ck1/gpt-neo-125M-finetuned-beer-recipes’,
‘bigcode/octocoder’,
‘bigcode/santacoder’,
‘bigcode/starcoder’,
‘bigcode/starcoder2-15b’,
‘bigcode/starcoder2-3b’,
‘bigcode/starcoderplus’,
‘bigscience/bloom’,
‘bigscience/bloom-560m’,
‘blockblockblock/smol_llama-220M-GQA-bpw2.5’,
‘codellama/CodeLlama-13b-hf’,
‘codellama/CodeLlama-34b-Instruct-hf’,
‘codellama/CodeLlama-7b-hf’,
‘CohereForAI/c4ai-command-r-plus’,
‘dh-unibe/gpt2-larger-walser’,
‘dh-unibe/luther-xl’,
‘EleutherAI/pythia-14m’,
‘flax-community/gpt-neo-125M-apps’,
‘google/flan-t5-xxl’,
‘google/gemma-1.1-2b-it’,
‘google/gemma-1.1-7b-it’,
‘google/gemma-2b’,
‘google/gemma-2b-it’,
‘google/gemma-7b’,
‘google/gemma-7b-it’,
‘gpt2-large’,
‘Gustavosta/MagicPrompt-Stable-Diffusion’,
‘h2oai/h2o-danube2-1.8b-chat’,
‘hsramall/hsramall-70b-chat-placeholder’,
‘HuggingFaceFW/ablation-model-fineweb-v1’,
‘HuggingFaceH4/starchat-beta’,
‘HuggingFaceH4/starchat2-15b-v0.1’,
‘HuggingFaceH4/tiny-random-LlamaForCausalLM’,
‘HuggingFaceH4/zephyr-7b-alpha’,
‘HuggingFaceH4/zephyr-7b-beta’,
‘HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1’,
‘HuggingFaceM4/idefics-80b-instruct’,
‘HuggingFaceM4/idefics-9b-instruct’,
‘huggingtweets/angelinacho-stillconor-touchofray’,
‘huggingtweets/itspublu’,
‘huggingtweets/j_j_j_j_j_jones’,
‘huggingtweets/jackieracc_’,
‘huggingtweets/jaguarunlocked’,
‘huggingtweets/lulaoficial-ptbrasil’,
‘IlyaGusev/saiga_7b_lora’,
‘ismaelfaro/gpt2-poems.en’,
‘ismaelfaro/gpt2-poems.es’,
‘jppaolim/homerGPT2’,
‘kashif/stack-llama-2’,
‘MBZUAI/LaMini-Neo-1.3B’,
‘MBZUAI/LLaVA-Phi-3-mini-4k-instruct’,
‘meta-llama/Llama-2-13b-chat-hf’,
‘meta-llama/Llama-2-13b-hf’,
‘meta-llama/Llama-2-70b-chat-hf’,
‘meta-llama/Llama-2-7b-chat-hf’,
‘meta-llama/Llama-2-7b-hf’,
‘meta-llama/Meta-Llama-3-70B-Instruct’,
‘meta-llama/Meta-Llama-3-8B-Instruct’,
‘microsoft/biogpt’,
‘microsoft/BioGPT-Large’,
‘microsoft/BioGPT-Large-PubMedQA’,
‘microsoft/DialoGPT-large’,
‘microsoft/DialoGPT-medium’,
‘microsoft/phi-1_5’,
‘microsoft/Phi-3-mini-128k-instruct’,
‘microsoft/Phi-3-mini-4k-instruct’,
‘mistralai/Mistral-7B-Instruct-v0.1’,
‘mistralai/Mistral-7B-Instruct-v0.2’,
‘mistralai/Mistral-7B-v0.1’,
‘mistralai/Mixtral-8x7B-Instruct-v0.1’,
‘model-attribution-challenge/bloom-350m’,
‘model-attribution-challenge/codegen-350M-multi’,
‘model-attribution-challenge/distilgpt2’,
‘model-attribution-challenge/gpt2’,
‘model-attribution-challenge/gpt2-xl’,
‘model-attribution-challenge/xlnet-base-cased’,
‘mywateriswet/ShuanBot’,
‘NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO’,
‘openai-community/gpt2-medium’,
‘openai-gpt’,
‘OpenAssistant/oasst-sft-1-pythia-12b’,
‘OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5’,
‘Pi3141/DialoGPT-medium-elon-2’,
‘rachiteagles/cover_letter’,
‘stabilityai/stablelm-2-1_6b’,
‘succinctly/text2image-prompt-generator’,
‘tiiuae/falcon-7b’,
‘timdettmers/guanaco-33b-merged’,
‘TinyLlama/TinyLlama-1.1B-Chat-v0.6’,
‘togethercomputer/RedPajama-INCITE-Chat-3B-v1’,
‘xlnet-base-cased’,
‘zrowt/test-deere’],

yes, it’s available with transformers, however, still incomplete the inference doesn’t support custom remote code.

And here is the list with all models that are “warm”
https://api-inference.huggingface.co/framework/text-generation-inference

If you need to deploy Phi-3-mini-128k-instruct as inference endpoint you’ll need a custom handler to support trust_remote_code=True

This list doesn’t contain some models that are listed as deployed & are working with the api, e.g. command-r-plus.

I only just signed up for the pro subscription so I’m not sure how it’s supposed to work. But I would have assumed the list returned by

client.list_deployed_models()

would represent the models that are available for inference with the API. If that isn’t the case, is there a way to get an authoritative list of deployed + working models?

Oops I was wrong, your list does contain command-r-plus.