Conversion to CoreML for On-Device Use

Hi Community - I’ve been playing around with converting HF models to CoreML for native, on-device use.

I’ve been able to convert GPT2 and basic BERT models but am having issues with BigBird-Pegasus.

I’m having a host of errors from “Tracer Warnings” to pytorch deprecation warnings. I’ve gone through the original paper, but there is scant information on the implementation for me to get a solid sense of the input shape and conversion requirements. I’m not sure what to even script, because the sparse attention is new to me. I’m hesitant to fork the HF implementation to knock down the obvious errors, because of the sheer volume.

Considering the dynamic input of the model, I am not positive what the input should be, so I played with various random tokens and tokenized inputs. Can’t even get past the trace.

Any help would be great. My current script and the full error are below (yes, I am aware of the Torch version warning).

Conversion Script

import coremltools
import os
import torch
from torchsummary import summary
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

transformers_cache_dir = os.environ.get(‘TRANSFORMERS_CACHE’) if >os.environ.get(‘TRANSFORMERS_CACHE’) else “/Users/m/transformers_cache/”

Load pre-trained model.

torch_model = AutoModelForSeq2SeqLM.from_pretrained(“google/bigbird-pegasus-large-pubmed”, torchscript=True, cache_dir=transformers_cache_dir)

Load tokenizer.

tokenizer = AutoTokenizer.from_pretrained(“google/bigbird-pegasus-large-pubmed”, cache_dir=transformers_cache_dir)

Set model to evaluation mode.

torch_model.eval()

Create dummy input.

random_tokens = torch.randint(10000, (1,4096))

traced_model = torch.jit.trace(torch_model, random_tokens) <----Errors out here.

Error

WARNING:root:Torch version 1.10.1 has not been tested with coremltools. You may run into unexpected errors. Torch 1.9.1 is the most recent version that has been tested.
{‘input_ids’: tensor([[ 8783, 47694, 15934, …, 110, 105, 1]]), ‘attention_mask’: tensor([[1, 1, 1, …, 1, 1, 1]])}
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py:1855: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if self.attention_type == “block_sparse” and input_shape[1] <= max_tokens_to_attend:
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py:2019: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if padding_len > 0:
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py:1979: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert (
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py:2004: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the ‘trunc’ function NOT ‘floor’). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode=‘trunc’), or for actual floor division, use torch.div(a, b, rounding_mode=‘floor’).
blocked_encoder_mask = attention_mask.view(batch_size, seq_length // block_size, block_size)
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py:277: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert from_seq_length % from_block_size == 0, “Query sided sequence length must be multiple of block size”
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py:278: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert to_seq_length % to_block_size == 0, “Key/Value sided sequence length must be multiple of block size”
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py:374: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the ‘trunc’ function NOT ‘floor’). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode=‘trunc’), or for actual floor division, use torch.div(a, b, rounding_mode=‘floor’).
if from_seq_len // from_block_size != to_seq_len // to_block_size:
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py:374: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if from_seq_len // from_block_size != to_seq_len // to_block_size:
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py:383: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if from_seq_len in [1024, 3072, 4096]: # old plans used in paper
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py:857: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the ‘trunc’ function NOT ‘floor’). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode=‘trunc’), or for actual floor division, use torch.div(a, b, rounding_mode=‘floor’).
if (2 * num_rand_blocks + 5) < (from_seq_length // from_block_size):
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py:857: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if (2 * num_rand_blocks + 5) < (from_seq_length // from_block_size):
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py:971: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the ‘trunc’ function NOT ‘floor’). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode=‘trunc’), or for actual floor division, use torch.div(a, b, rounding_mode=‘floor’).
from_seq_length // from_block_size == to_seq_length // to_block_size
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py:970: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert (
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py:974: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert from_seq_length in plan_from_length, “Error from sequence length not in plan!”
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py:977: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the ‘trunc’ function NOT ‘floor’). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode=‘trunc’), or for actual floor division, use torch.div(a, b, rounding_mode=‘floor’).
num_blocks = from_seq_length // from_block_size
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py:979: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
plan_block_length = np.array(plan_from_length) // from_block_size
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py:981: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
max_plan_idx = plan_from_length.index(from_seq_length)
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py:407: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
rand_attn = torch.tensor(rand_attn, device=query_layer.device, dtype=torch.long)
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py:834: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the ‘trunc’ function NOT ‘floor’). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode=‘trunc’), or for actual floor division, use torch.div(a, b, rounding_mode=‘floor’).
num_windows = from_seq_length // from_block_size - 2
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py:835: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won’t change the number of iterations executed (and might lead to errors or silently give incorrect results).
rand_mask = torch.stack([p1[i1.flatten()] for p1, i1 in zip(to_blocked_mask, rand_attn)])
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py:415: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the ‘trunc’ function NOT ‘floor’). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode=‘trunc’), or for actual floor division, use torch.div(a, b, rounding_mode=‘floor’).
blocked_query_matrix = query_layer.view(bsz, n_heads, from_seq_len // from_block_size, from_block_size, -1)
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py:416: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the ‘trunc’ function NOT ‘floor’). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode=‘trunc’), or for actual floor division, use torch.div(a, b, rounding_mode=‘floor’).
blocked_key_matrix = key_layer.view(bsz, n_heads, to_seq_len // to_block_size, to_block_size, -1)
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py:417: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the ‘trunc’ function NOT ‘floor’). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode=‘trunc’), or for actual floor division, use torch.div(a, b, rounding_mode=‘floor’).
blocked_value_matrix = value_layer.view(bsz, n_heads, to_seq_len // to_block_size, to_block_size, -1)
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py:781: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if params.shape[:2] != indices.shape[:2]:
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py:790: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the ‘trunc’ function NOT ‘floor’). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode=‘trunc’), or for actual floor division, use torch.div(a, b, rounding_mode=‘floor’).
torch.arange(indices.shape[0] * indices.shape[1] * num_indices_to_gather, device=indices.device)
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py:422: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the ‘trunc’ function NOT ‘floor’). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode=‘trunc’), or for actual floor division, use torch.div(a, b, rounding_mode=‘floor’).
bsz, n_heads, to_seq_len // to_block_size - 2, n_rand_blocks * to_block_size, -1
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py:426: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the ‘trunc’ function NOT ‘floor’). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode=‘trunc’), or for actual floor division, use torch.div(a, b, rounding_mode=‘floor’).
bsz, n_heads, to_seq_len // to_block_size - 2, n_rand_blocks * to_block_size, -1
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py:1950: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if padding_len > 0:
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py:2082: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if input_shape[-1] > 1:
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py:1290: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py:1296: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attention_mask.size() != (bsz, 1, tgt_len, src_len):
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py:1327: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
Traceback (most recent call last):
File “/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/jit/_trace.py”, line 443, in run_mod_and_filter_tensor_outputs
outs = wrap_retval(mod(*_clone_inputs(inputs)))
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py(408): bigbird_block_sparse_attention
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py(284): forward
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/nn/modules/module.py(1102): _call_impl
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py(1190): forward
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/nn/modules/module.py(1102): _call_impl
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py(1382): forward
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/nn/modules/module.py(1102): _call_impl
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py(1928): forward
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/nn/modules/module.py(1102): _call_impl
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py(2391): forward
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/nn/modules/module.py(1102): _call_impl
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py(2520): forward
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/nn/modules/module.py(1102): _call_impl
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/jit/trace.py(958): trace_module
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/jit/trace.py(741): trace
/Users/m/Open Source/bigbird-pegasus-large-pubmed/torch_to_coreml.py(122):
RuntimeError: set_storage_offset is not allowed on a Tensor created from .data or .detach().
If your intent is to change the metadata of a Tensor (such as sizes / strides / storage / storage_offset)
without autograd tracking the change, remove the .data / .detach() call and wrap the change in a with torch.no_grad(): block.
For example, change:
x.data.set
(y)
to:
with torch.no_grad():
x.set
(y)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “/Users/m/Open Source/bigbird-pegasus-large-pubmed/torch_to_coreml.py”, line 122, in
traced_model = torch.jit.trace(torch_model, inputs[‘input_ids’])
File “/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/jit/_trace.py”, line 741, in trace
return trace_module(
File “/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/jit/_trace.py”, line 983, in trace_module
_check_trace(
File “/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/autograd/grad_mode.py”, line 28, in decorate_context
return func(*args, **kwargs)
File “/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/jit/_trace.py”, line 516, in _check_trace
traced_outs = run_mod_and_filter_tensor_outputs(traced_func, inputs, “trace”)
File “/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/jit/_trace.py”, line 449, in run_mod_and_filter_tensor_outputs
raise TracingCheckError(
torch.jit._trace.TracingCheckError: Tracing failed sanity checks!
encountered an exception while running the trace with test inputs.
Exception:
The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py(408): bigbird_block_sparse_attention
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py(284): forward
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/nn/modules/module.py(1102): _call_impl
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py(1190): forward
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/nn/modules/module.py(1102): _call_impl
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py(1382): forward
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/nn/modules/module.py(1102): _call_impl
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py(1928): forward
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/nn/modules/module.py(1102): _call_impl
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py(2391): forward
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/nn/modules/module.py(1102): _call_impl
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py(2520): forward
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/nn/modules/module.py(1102): _call_impl
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/jit/trace.py(958): trace_module
/Users/m/venvs/pytorch-coremltools-transformers/lib/python3.9/site-packages/torch/jit/trace.py(741): trace
/Users/m/Open Source/bigbird-pegasus-large-pubmed/torch_to_coreml.py(122):
RuntimeError: set_storage_offset is not allowed on a Tensor created from .data or .detach().
If your intent is to change the metadata of a Tensor (such as sizes / strides / storage / storage_offset)
without autograd tracking the change, remove the .data / .detach() call and wrap the change in a with torch.no_grad(): block.
For example, change:
x.data.set
(y)
to:
with torch.no_grad():
x.set
(y)

Wait, what: you converteg GPT-2 to “.mlmodel” using coremltools without any errors and successfully tested it on ios device ?

tagging @Matthijs just in case!

@mlo @atomicai are you still exploring CoreML conversion of HF-hosted models? we are exploring this subject a bit too right now

Wow, that means that these models are getting NE and GPU support out of the script.

Coremltools i tried today and with simple models it seems to work and be pretty self explanatory, yet would love to learn about your expirience too.
Could you share details of your approach and scripts to converting these networks you mentioned here also?

I could convert basic models but could never get other transformers with more cutting-edge attention to work (e.g. BB-P).

That said, it’s been a long time since I played around with this, and Apple has been actively working on improving the conversation significantly.

Yes, if setup properly with CoreML, you can technically get NE.

It’s been a long time since I’ve played with this but basic transformers always converted for me. If I remember correctly, not too many people were doing these conversions, but Apple has been pushing to improve broader conversion of various models.

If I have time, I’ll write up some of my notes, but I’m rusty and like you said, the standard conversation is relatively straightforward. If you are converting larger, more complex models, my advice would be to look at coreml tool’s new weight compression utilities. They didn’t have that when I was playing around.

Like most things, it’s the optimization and tweaking that can be difficult and opinionated.

In short, yes.

Tagging once more @Matthijs and @osanseviero to this interesting thread!

Hi all, I’m currently adding an automated way to do Core ML export from HF Transformers, similar to how the ONNX exports work. I didn’t attempt to convert the BigBird-Pegasus model yet, but I’ll add it to the list of models to try.

Core ML conversion problems typically happen because the original model does something the converter (coremltools or the PyTorch JIT trace) doesn’t understand, or because there are bugs or limitations in coremltools. It might require modifying the original HF implementation, which is not trivial. Unfortunately, conversion can be a (lengthy) process of trial-and-error.

If anyone has any models they want to see converted to Core ML and are running into problems with it, let me know and I’ll put them on my list. :slight_smile:

4 Likes

@Matthijs and everyone playing around with this, I think in addition to limitations in coreml tools, I’ve encountered real-world issues. When running converted models on iPhone 12 Pro (or likely older), there are weird, unpredictable runtime issues (the app crashes).

For example, for real-world devices with variable storage limits, the app running the model can suddenly crash (on some runs). On initial investigation, it seems to be memory-related (when the device is running low on storage or many apps running). I think this could be easily tested by running it on a “fresh” device with plenty of storage. Just something to keep in mind.

I’ve only been spending a few minutes here and there during breaks, so this isn’t as rigorous as I’d like to be.

It is very likely this is due to running out of memory indeed. Most of the models in Transformers are rather large and so they need a lot of RAM. When the app tries to use more RAM than is available, iOS will terminate it. I’m also not entirely sure that Core ML itself has no memory issues, but there’s nothing we can do about that, unfortunately.

Hello @Matthijs @mlo, I’ve been playing around Torchscript and coreML translation for large models too, do you have any further progress on topic? Didn’t meed much success within models like pegasus yet

I’ve been able to make some progress but it’s still not as good as I want it to be. It will take some time to work out all the kinks.

Using GitHub - huggingface/exporters: Export Hugging Face models to Core ML and TensorFlow Lite it should be possible to convert the PegasusForConditionalGeneration model (haven’t tried PegasusForCausalLM yet) but you’ll need to build coremltools yourself from the main branch to get all the latest patches.

1 Like

I’m trying to convert BloomZ right now, but I get errors the whole time…
But maybe there is some progress with it.
I saw that apple builded an own chain for stable diffusion,
maybe that is helpful for anyone:

I already tried it and it runs very smooth with the generating of picture.
Maybe anyone of you can find something helpful in their even if it’s an diffusion model

is there a way to convert gpt4all / gpt-j (fine tuned) models for ios use in the *.mlmodel format?