Hi,
I wanted to try XLA for Inference.
`import os
#environment configs
import tensorflow as tf
from transformers import BertTokenizer, TFBertModel
tokenizer = BertTokenizer.from_pretrained(âbert-base-uncasedâ)
model = TFBertModel.from_pretrained(âbert-base-uncasedâ)
xla_generate = tf.function(model, jit_compile=True)
text = âReplace me by any text youâd like.â
encoded_input = tokenizer(text, return_tensors=âtfâ)
from time import time
start_time = time()
for i in range(1):
output = xla_generate(encoded_input)
end_time = time()
time_taken = end_time - start_time
print("average time (seconds) for bert Inference: ", time_taken)`
For this code I am getting timing which more than without XLA and IF look into perf report of this I am not seeing any XLA OP and I am seeing oneDNN . So I wanted to understand how this XLA works with oneDNN
Thanks