I’m experiencing an issue with the inference API for my Vision Transformer (ViT) model, rshrott/vit-base-renovation2.
When I attempt to use the API, I receive the following error:
{
“error”: "HfApiJson(Deserialize(Error(“unknown variant image-feature-extraction
, expected one of audio-classification
, audio-to-audio
, audio-source-separation
, automatic-speech-recognition
, feature-extraction
, text-classification
, token-classification
, question-answering
, translation
, summarization
, text-generation
, text2text-generation
, fill-mask
, zero-shot-classification
, zero-shot-image-classification
, conversational
, table-question-answering
, image-classification
, image-segmentation
, image-to-text
, text-to-speech
, … visual-question-answering
, video-classification
, document-question-answering
, image-to-image
, depth-estimation
, line: 1, column: 318)))”
}
Interestingly, when I use the transformers pipeline directly in Python, the model works as expected:
from transformers import pipeline
from PIL import Image
import requests
pipe = pipeline(model=“rshrott/vit-base-renovation2”)
url = ‘https://example.com/image.jpeg’
image = Image.open(requests.get(url, stream=True).raw)
preds = pipe(image)
This code runs without any issues and returns the expected predictions. However, the same model encounters an error when used through the inference API. I suspect there might be a configuration issue related to the expected task type, but I’m not sure how to resolve it.
Could you please help me understand why this error is occurring and how I can fix it? I’ve checked the model card and configuration, but I can’t seem to find where “image-feature-extraction” is coming from or why it’s expected.
Thank you for your assistance!