Intel OpenVINO backend

Hi! We would like to start a discussion about adding Intel OpenVINO backend in Transformers library.

If you have not heard about OpenVINO before, it鈥檚 a library which accelerates deep learning inference (not training, but inference of pretrained models) on Intel Architecture (CPU, GPU, VPU and others). The library is distributed in PyPI and developed in open source.

Currently, there is an issue here: Intel OpenVINO inference backend 路 Issue #13987 路 huggingface/transformers 路 GitHub (for GitHub discussions) and the latest proposal here: Intel OpenVINO backend by dkurt 路 Pull Request #1 路 dkurt/transformers 路 GitHub

Example (QA):

from transformers import AutoTokenizer, OVAutoModelForQuestionAnswering

tok = AutoTokenizer.from_pretrained("dkurt/bert-large-uncased-whole-word-masking-squad-int8-0001")
model = OVAutoModelForQuestionAnswering.from_pretrained("dkurt/bert-large-uncased-whole-word-masking-squad-int8-0001")

context = """
Soon her eye fell on a little glass box that
was lying under the table: she opened it, and
found in it a very small cake, on which the
words 鈥淓AT ME鈥 were beautifully marked in
currants. 鈥淲ell, I鈥檒l eat it,鈥 said Alice, 鈥 and if
it makes me grow larger, I can reach the key ;
and if it makes me grow smaller, I can creep
under the door; so either way I鈥檒l get into the
garden, and I don鈥檛 care which happens !鈥

question = "Where Alice should go?"

input_ids = tok.encode(question + " " + tok.sep_token + " " + context, return_tensors="pt")

outputs = model(input_ids)

start_pos = outputs.start_logits.argmax()
end_pos = outputs.end_logits.argmax() + 1

answer_ids = input_ids[0, start_pos:end_pos]
answer = tok.convert_tokens_to_string(tok.convert_ids_to_tokens(answer_ids))

print("Question:", question)
print("Answer:", answer)

Opened a pull request at Intel OpenVINO backend (inference only) by dkurt 路 Pull Request #14203 路 huggingface/transformers 路 GitHub. Any feedback welcome!