I have the following code:
from optimum.onnxruntime import ORTModelForCausalLM
model_name = "databricks/dolly-v2-3b" model and faster inferences.
tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side="left")
model = ORTModelForCausalLM.from_pretrained("databricks/dolly-v2-3b", export=True, provider="CUDAExecutionProvider")
Getting error:
2023-05-05 04:39:44.132458586 [W:onnxruntime:, session_state.cc:1138 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2023-05-05 04:39:44.653800957 [E:onnxruntime:, inference_session.cc:1532 operator()] Exception during initialization: /onnxruntime_src/onnxruntime/core/framework/bfc_arena.cc:368 void*
onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 78643200
It breaks when nvidia-smi results are:
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 33981 C python3 9234MiB |
+---------------------------------------------------------------------------------------+
Fri May 5 13:24:59 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla V100-PCIE-16GB On | 00000000:04:01.0 Off | 0 |
| N/A 35C P0 44W / 250W| 15874MiB / 16384MiB | 45% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 Tesla V100-PCIE-16GB On | 00000000:04:02.0 Off | 0 |
| N/A 32C P0 24W / 250W| 4MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
My machine specs are 2 V100 16GB. I’m monitoring nividia-smi but I can’t see memory consumption up to 16GB on both GPUs. I highly doubt that it is using both GPUs.