Hi, I’d like to convert CLIP model to CoreML but got some errors. Has anyone managed to do this? If someone can help me go in the right direction it would be great.
I thought of first converting the vision model. Here’s my code so far:
import coremltools as ct
import torch
from transformers import CLIPProcessor, CLIPModel
model = CLIPModel.from_pretrained(model_version)
visual_model = model.vision_model
visual_model.eval()
# Trace the model with random data.
example_input_image = torch.rand(1, 3, 224, 224)
traced_model = torch.jit.trace(visual_model, example_input_image)
out = traced_model(example_input_image)
There’s a couple of issues:
-
RuntimeError: Encountering a dict at the output of the tracer might cause the trace to be incorrect, this is only valid if the container structure does not change based on the module’s inputs. Consider using a constant container instead (e.g. for
list
, use atuple
instead. fordict
, use aNamedTuple
instead). If you absolutely need this and know the side effects, pass strict=False to trace() to allow this behavior. -
Warning: site-packages/transformers/models/clip/modeling_clip.py:222: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
-
Warning: /site-packages/transformers/models/clip/modeling_clip.py:262: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):