installed packages:
python=3.10.0
onnxruntime-gpu=1.17.0
pytorch =2.2.2
pytorch-cuda=11.8
pytorch-lightning =1.9.3
transformers=4.41.2
I have trained SegFormer model with pretrained weights as
mentioned with classifier layer:
from transformers import SegformerForSemanticSegmentation
class SegformerFinetuner(LightningModule):
def __init__(self, model_name="nvidia/segformer-b0-finetuned-ade-512-512", learning_rate=1e-4):
super().__init__()
self.model = SegformerForSemanticSegmentation.from_pretrained(model_name) # input channel --> 3; output channel --> 150
self.classifier = nn.Sequential(
nn.Conv2d(150, 64, kernel_size=3, padding=1),
nn.Dropout(0.1),
nn.ReLU(),
nn.Conv2d(64, 1, kernel_size=1),
)
After training, I exported it to the onnx with:
inference_model = SegformerFinetuner()
inference_model.load_state_dict(torch.load(".\\pt_model\\trained_model.pt"))
inference_model.cuda()
inference_model.eval()
dummy_input = torch.randn(1, 3, 384, 384, requires_grad=True).cuda()
dynamic_axes={
"input": {0: "batch", 2: "height", 3: "width"},
"output": {0: "batch", 2: "height", 3: "width"}
}
# export the model to onnx
torch.onnx.export(inference_model, dummy_input, f"model_{model_id}.onnx",
input_names=['input'], output_names=['output'],
do_constant_folding=True,
opset_version=16,dynamic_axes=dynamic_axes, verbose=False,
export_params=True)
# start inference
options = SessionOptions()
options.graph_optimization_level = GraphOptimizationLevel.ORT_ENABLE_ALL
onnx_session= onnxrt.InferenceSession(f"model_{model_id}.onnx", options, providers=['CUDAExecutionProvider'])
onnx_inputs= {onnx_session.get_inputs()[0].name: img_torch.unsqueeze(0).cpu().numpy()}
# perform inference and calculate time taken
t_0 = time.perf_counter()
onnx_output = onnx_session.run(None, onnx_inputs)[0]
t_1 = time.perf_counter()
Inference time output:
second inference time with PyTorch on GPU RTX4090–> torch.Size([155, 3, 384, 384]): 0.02 seconds
second inference time with ONNX with GPU RTX4090–> torch.Size([155, 3, 384, 384]) is → 5.06 seconds
I tried Export 🤗 Transformers Models with a thought that my be this would short the inference time but couldn’t succeed with tokenizer issue
from transformers import AutoTokenizer, AutoModelForSemanticSegmentation
# Load tokenizer and PyTorch weights form the Hub
pt_model = AutoModelForSemanticSegmentation.from_pretrained("nvidia/segformer-b0-finetuned-ade-512-512")
tokenizer = AutoTokenizer.from_pretrained("nvidia/segformer-b0-finetuned-ade-512-512")
local_pt_checkpoint = ".\\segformer_trained_checkpoint"
# save
pt_model.save_pretrained(local_pt_checkpoint)
# export to onnx
!python -m transformers.onnx --model=local_pt_checkpoint onnx/
error:
KeyError Traceback (most recent call last)
Cell In[38], line 6
---> 6 tokenizer = AutoTokenizer.from_pretrained("nvidia/segformer-b0-finetuned-ade-512-512"
KeyError: <class 'transformers.models.segformer.configuration_segformer.SegformerConfig'>
Is there any way to reduce the inference time for ONNX or any flag i’m missing in onnx export call? Any help would be appreciated. Thank you