Not able run all nodes on DML with optimum

I’m new to HuggingFace/Optimum, in fact I started working on AI/ML stuff recently.

I tried to optimum models with DML EP (on my windows PC), for example take optimum/vit-base-patch16-224 · Hugging Face

model = ORTModelForImageClassification.from_pretrained(model_name, provider=“DmlExecutionProvider”)

onnx 1.16.1
onnxruntime 1.18.0
onnxruntime-directml 1.18.0
optimum 1.20.0

I see nodes are distributed between CPU EP & DML EP. Also, noticed different instances of same node are placed on both DML and CPU.

from verbose logs

2024-06-05 11:11:22.1833502 [V:onnxruntime:, session_state.cc:1152 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Node(s) placed on [DmlExecutionProvider]. Number of nodes: 335
2024-06-05 11:11:22.1936262 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Reshape (Reshape_8)
2024-06-05 11:11:22.2061078 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Concat (Concat_25)

2024-06-05 11:11:22.8286509 [V:onnxruntime:, session_state.cc:1152 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Node(s) placed on [CPUExecutionProvider]. Number of nodes: 9
2024-06-05 11:11:22.8322004 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Concat (Concat_7)
2024-06-05 11:11:22.8387018 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Reshape (Reshape_17)

For example take “Reshape” node/operator, I believe this node is supported on DML(Reshape_8 - is placed on DML), then why Reshape_17 instance of this node is placed on CPU

Why the few node instances are placed on CPU, even though DML have support for those nodes?

I expect, with provider=“DmlExecutionProvider” option, all nodes should be placed on DML only (exception - if there is no native support on DML for a particular node). But in the above case, all the nodes placed on CPU, support is present on DML

How can I force all nodes to be placed on DML? If the nodes got distributed b/w CPU and DML, I expect some overhead due to data transfer b/w CPU and DML

Thanks,

Looks like a duplicate of Not able to run on DML with pipeline

This is automatic behavior in onnxruntime:

Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.

There was an issue in onnxruntime repo that was closed without giving a solution, maybe you’d wanna track it there.

No, it is different. As I mentioned in the other post, pipeline is forcing everything to CPU EP and I don’t see the issue with the latest packages

Thanks @IlyasMoutawwakil for your response.

Not just with shape related operators. simple ops like add, sub, equal are also getting placed on CPU EP with provider=“DmlExecutionProvider” option

with optimum/gpt2 · Hugging Face

2024-06-06 08:58:16.7010989 [V:onnxruntime:, session_state.cc:1152 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Node(s) placed on [CPUExecutionProvider]. Number of nodes: 502
2024-06-06 08:58:16.7177866 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Add (Add_241)
2024-06-06 08:58:16.7669495 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Pow (Pow_363)
2024-06-06 08:58:16.7753360 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Sub (Sub_380)
:
2024-06-06 08:58:18.0929380 [V:onnxruntime:, session_state.cc:1152 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Node(s) placed on [DmlExecutionProvider]. Number of nodes: 526
2024-06-06 08:58:18.1281232 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Sub (Sub_261)
2024-06-06 08:58:18.1397490 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Add (Add_266)