python3 -m optimum.commands.optimum_cli export tflite --model merged_gemma3_aie_finetuned_hf --task question-answering --sequence_length 1024 gemma_tflite/
we are using this command but the error we are facing is transformer doesnt recognize gemma 3 text config.
2 Likes
Gemma 3 is maybe not yet supported except for the dev versions of Transformers and optimum, which can be installed from the github source.
I think that the conversion itself can be done by installing these dev versions, but I think there would be still a lot of problems.
If you have any questions about ONNX, the best way to get a reliable answer is to contact the ONNX Community within Hugging Face.
opened 07:05AM - 27 Feb 24 UTC
closed 10:54AM - 27 Feb 24 UTC
### Feature request

### Motivation
I'm new to optimising for inference, this would benefit me other beginners greatly in evaluating runtimes and costs expected for many models.
### Your contribution
- I could given time and more familiarity going forward - but would like this to be integrated quickly !
opened 05:07PM - 27 Feb 24 UTC
closed 06:54AM - 19 Mar 24 UTC
bug
### System Info
```shell
My system currently is
python = 3.8
optimum-intel … : optimum-1.18.0.dev0
```
### Who can help?
@JingyaHuang @echarlaix
### Information
- [X] The official example scripts
- [ ] My own modified scripts
### Tasks
- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [X] My own task or dataset (give details below)
### Reproduction (minimal, reproducible, runnable)
I am following the provided examples on #1714 but am running into some issues.
when I run
`optimum-cli export onnx -m google/gemma-2b gemma_onnx`
I get the following error

When I execute the python script provided I get the following error

### Expected behavior
The expected behavior is for the model to compile and be stored in onnx form
pip uninstall transformers optimum
pip install git+https://github.com/huggingface/optimum git+https://github.com/huggingface/transformers
Thankyou for your response, i fine tuned a gemma 3 model and now i need to convert it into .tflite, what are all the ways to do it, i used the dev version of Transformers and optimum but still it throwing error like “Gemma3 Text config is not recognized by the transformer”.
What are all the ways to convert it?
And also if I convert it into onnx then can i convert into .tflite? if yes, how to convert it into onnx?
1 Like
this is the error i am getting
Traceback (most recent call last):
File “C:\Users\aiehy\OneDrive\Desktop\training1\tflite.py”, line 46, in
tf_model = TFAutoModelForCausalLM.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\aiehy\OneDrive\Desktop\training1.venv\Lib\site-packages\transformers\models\auto\auto_factory.py”, line 576, in from_pretrained
raise ValueError(
ValueError: Unrecognized configuration class <class ‘transformers.models.gemma3.configuration_gemma3.Gemma3TextConfig’> for this kind of AutoModel: TFAutoModelForCausalLM.
Model type should be one of BertConfig, CamembertConfig, CTRLConfig, GPT2Config, GPT2Config, GPTJConfig, MistralConfig, OpenAIGPTConfig, OPTConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoFormerConfig, TransfoXLConfig, XGLMConfig, XLMConfig, XLMRobertaConfig, XLNetConfig.
1 Like
Hmm…
or perhaps:
#python3 -m optimum.commands.optimum_cli export tflite --model merged_gemma3_aie_finetuned_hf --task question-answering --sequence_length 1024 gemma_tflite/
python3 -m optimum.commands.optimum_cli export tflite --model merged_gemma3_aie_finetuned_hf --task text-generation --sequence_length 1024 gemma_tflite/
Hello, I was able to overcome this problem by making some changes to the codes prepared by Google in colab. You can convert it directly to tflite and .task format later if desired. Instead of fine-tuning from the beginning, I used the already trained model.
import os
from google.colab import userdata
os.environ["HF_TOKEN"] = userdata.get('HF_TOKEN')
!pip3 install --upgrade -q -U bitsandbytes
!pip3 install --upgrade -q -U peft
!pip3 install --upgrade -q -U trl
!pip3 install --upgrade -q -U accelerate
!pip3 install --upgrade -q -U datasets
!pip3 install git+https://github.com/huggingface/transformers@v4.49.0-Gemma-3
! pip install git+https://github.com/google-ai-edge/ai-edge-torch
! pip install ai-edge-litert
! pip install mediapipe
!pip install huggingface_hub
from huggingface_hub import snapshot_download
import shutil
# 🔧 Ayarları yap
model_name = "username/model_repo" # ← fine-tuned model
local_dir = "/content/merged_model"
# 💾 Hugging Face
snapshot_download(
repo_id=model_name,
local_dir=local_dir,
local_dir_use_symlinks=False # # Don't bother with symlink, just copy it directly
)
print(f"Model downloaded: {local_dir}")
!git clone https://github.com/google-ai-edge/ai-edge-torch.git
!pip uninstall numpy
!pip uninstall torch torchvision torchaudio
!pip uninstall ai-edge-torch ai-edge-litert ai-edge-quantizer torch-xla2 safetensors
!pip install numpy
!pip install torch torchvision torchaudio
!pip install -r https://raw.githubusercontent.com/google-ai-edge/ai-edge-torch/main/requirements.txt
!pip install --upgrade numpy
!pip install --upgrade --force-reinstall ai-edge-torch
!pip install --upgrade --force-reinstall ai-edge-litert
!pip install --upgrade --force-reinstall ai-edge-quantizer
!pip install --upgrade --force-reinstall torch-xla2
!pip install --upgrade --force-reinstall safetensors
from ai_edge_torch.generative.examples.gemma3 import gemma3
from ai_edge_torch.generative.utilities import converter
from ai_edge_torch.generative.utilities.model_builder import ExportConfig
from ai_edge_torch.generative.layers.experimental import kv_cache
import torch
def _create_mask(mask_len, kv_cache_max_len):
mask = torch.full((mask_len, kv_cache_max_len), float('-inf'), dtype=torch.float32)
return torch.triu(mask, diagonal=1).unsqueeze(0).unsqueeze(0)
def _create_export_config(prefill_seq_lens: list[int], kv_cache_max_len: int) -> ExportConfig:
export_config = ExportConfig()
export_config.prefill_mask = [_create_mask(i, kv_cache_max_len) for i in prefill_seq_lens]
decode_mask = torch.full((1, kv_cache_max_len), float('-inf'), dtype=torch.float32)
export_config.decode_mask = torch.triu(decode_mask, diagonal=1).unsqueeze(0).unsqueeze(0)
export_config.kvcache_cls = kv_cache.KVCacheTransposed
return export_config
with torch.inference_mode(True):
checkpoint_path = "/content/merged_model"
pytorch_model = gemma3.build_model_1b(
checkpoint_path, kv_cache_max_len=2048
)
export_config = _create_export_config([1024], 2048)
converter.convert_to_tflite(
pytorch_model,
output_path="/content/",
output_name_prefix="gemma3_1b_finetune",
prefill_seq_len=[1024],
quantize=True,
lora_ranks=None,
export_config=export_config
)
from mediapipe.tasks.python.genai.bundler import llm_bundler
def build_task_bundle():
config = llm_bundler.BundleConfig(
tflite_model="/content/gemma3_1b_finetune_q8_ekv2048.tflite",
tokenizer_model="/content/merged_model/tokenizer.model",
start_token="<bos>",
stop_tokens=["<eos>", "<end_of_turn>"],
output_filename="/content/gemma3-1b-it.task",
enable_bytes_to_unicode_mapping=False,
prompt_prefix="<start_of_turn>user\n",
prompt_suffix="<end_of_turn>\n<start_of_turn>model\n",
)
llm_bundler.create_bundle(config)
build_task_bundle()
1 Like