Run ONNXRUNTIME for insightface Model

I am trying to use pretrained buffalo_l model for insightface and i am able to run it in CPU but i wanted to Run it using CUDA and i am getting below error

2024-02-22 16:38:41.819222612 [E:onnxruntime:Default, provider_bridge_ort.cc:1546 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1209 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.11: cannot open shared object file: No such file or directory
I have Nvidia RTX GeForce 3060 TI
can any one guild me how to install the right version of CUDA for this model,
Can I run the above model in CUDA 12.3 the latest Version available
Thanks and regards,

1 Like

Brother help I am also facing the similar issue. Please provide some solution till today. I have a deadline

1 Like

It doesn’t seem to be a model-specific error, so with a bit of luck, reinstalling the ONNX Runtime might fix it?


I will answer your questions directly first, then go through the details and concrete steps.

  1. The error is not caused by the InsightFace buffalo_l model. It is caused by a mismatch between:

    • the ONNX Runtime GPU wheel you installed, and
    • the CUDA / cuBLAS libraries actually present on your system.
  2. Yes, you can run buffalo_l with CUDA 12.3 on an RTX 3060 Ti. The model is just an ONNX graph; it does not depend on a specific CUDA version. What must match is:

    • onnxruntime-gpu build ↔ CUDA major version ↔ cuDNN major version.(ONNX Runtime)

1. What your error means, line by line

Error:

Failed to load library libonnxruntime_providers_cuda.so with error:
libcublasLt.so.11: cannot open shared object file: No such file or directory

Interpretation:

  • libonnxruntime_providers_cuda.so is the CUDA Execution Provider plugin for ONNX Runtime.
  • That plugin was compiled against CUDA 11.x, which exposes the cuBLAS Lt library as libcublasLt.so.11.(ONNX Runtime)
  • The dynamic linker searches your library paths (LD_LIBRARY_PATH, system paths).
  • It cannot find libcublasLt.so.11, so CUDA EP fails to initialize.
  • ONNX Runtime then falls back to CPU only.

This error appears in many other projects (not InsightFace-specific) when a CUDA-11 ONNX Runtime wheel is used on a machine that only has CUDA-12 libraries (libcublasLt.so.12).(Stack Overflow)

Your GPU (RTX 3060 Ti, compute capability 8.6) is fully supported by CUDA 11 and 12; the GPU hardware is not the problem.


2. How InsightFace actually uses CUDA

InsightFace ≥ 0.2 uses ONNX Runtime as its backend:(PyPI)

  • InsightFace itself does not talk to CUDA directly.
  • It loads ONNX models (such as buffalo_l) and creates an onnxruntime.InferenceSession.
  • It enables CUDA by using the CUDA Execution Provider, which is implemented in libonnxruntime_providers_cuda.so.

The InsightFace PyPI page states clearly:(PyPI)

“You have to install onnxruntime-gpu manually to enable GPU inference, or install onnxruntime to use CPU only inference.”

So, GPU support is controlled entirely by:

  • which ONNX Runtime package you installed, and
  • whether that package can successfully load its CUDA dependencies.

3. Version requirements from ONNX Runtime’s own docs

The official CUDA Execution Provider page for ONNX Runtime says:(ONNX Runtime)

  • ONNX Runtime GPU builds are compiled against specific CUDA and cuDNN major versions.

  • Thanks to NVIDIA’s minor-version compatibility:

    • A build compiled with CUDA 11.8 works with any CUDA 11.x runtime.
    • A build compiled with CUDA 12.x works with any CUDA 12.x runtime.
  • However, cuDNN 8 and cuDNN 9 are not ABI-compatible:

    • “ONNX Runtime built with cuDNN 8.x is not compatible with cuDNN 9.x, and vice versa.”

Another ONNX Runtime note clarifies:

  • Starting with ORT 1.19, CUDA 12 becomes the default for GPU packages.
  • Starting with ORT 1.22, only CUDA 12 GPU packages are released.(ONNX Runtime)

So, in today’s ecosystem:

  • Modern onnxruntime-gpu wheels on PyPI are primarily CUDA-12-based.
  • Older wheels (1.18 and below, or special builds) may still be CUDA-11-based and will look for libcublasLt.so.11.

Your error mentions .so.11, so your current onnxruntime-gpu wheel is almost certainly a CUDA-11 build, running on a system that only exposes CUDA-12 style libraries.


4. Direct answers to your two explicit questions

4.1 “How do I install the right CUDA version for this model?”

Strictly speaking, there is no “CUDA version for this model.” buffalo_l is an ONNX model; it will run on any device that ONNX Runtime supports.

What needs to be aligned is:

  1. Your installed CUDA runtime (11.x or 12.x).
  2. Your installed cuDNN (major version 8 or 9).
  3. The onnxruntime-gpu wheel you install (built for CUDA-11+cuDNN-8 or CUDA-12+cuDNN-9).(ONNX Runtime)

You have two practical options:

  • Either:

    • Keep your current CUDA 12.3 installation, and
    • Make sure you install a CUDA-12 onnxruntime-gpu wheel with matching cuDNN.
  • Or:

    • Install CUDA 11.8 + cuDNN 8, and
    • Install a CUDA-11 onnxruntime-gpu wheel that expects libcublasLt.so.11.

Using CUDA 12.3 + a modern ORT GPU build is the more future-proof and simpler choice for a fresh setup.

4.2 “Can I run this model on CUDA 12.3?”

Yes.

From the ONNX Runtime docs (paraphrased):frowning:ONNX Runtime)

“ONNX Runtime built with CUDA 12.x are compatible with any CUDA 12.x version.”

That includes CUDA 12.3, 12.4, etc.

So:

  • If your GPU driver supports CUDA 12.x (it does, for a 3060 Ti with a recent driver).
  • And if you install a CUDA-12 onnxruntime-gpu wheel (or use the extra that bundles CUDA-12 and cuDNN-9).

Then buffalo_l will run on GPU without any model changes.


5. Recommended solution: let onnxruntime-gpu bring its own CUDA + cuDNN

The cleanest and least fragile solution today is:

Use the onnxruntime-gpu wheel that bundles a matching CUDA and cuDNN runtime, instead of relying on your system CUDA installation.

Recent ONNX Runtime releases provide such wheels, installable with pip extras:(ONNX Runtime)

Step 1 – Start from a clean Python environment

Use a virtualenv or conda environment to avoid conflicts:

python -m venv ort_env
source ort_env/bin/activate        # Linux/macOS
# or: .\ort_env\Scripts\activate   # Windows PowerShell

Step 2 – Remove conflicting ONNX Runtime packages

pip uninstall -y onnxruntime onnxruntime-gpu

InsightFace’s own guidance (and several community posts) recommend not keeping both onnxruntime (CPU) and onnxruntime-gpu installed, because some code paths may accidentally choose the CPU package.(PyPI)

Step 3 – Install GPU build + CUDA + cuDNN via pip

pip install --upgrade pip
pip install "onnxruntime-gpu[cuda,cudnn]"

What this does:

  • Installs a recent onnxruntime-gpu (typically 1.19+).

  • Automatically installs matching:

    • nvidia-cuda-runtime-cu12 (or similar CUDA-12 runtime),
    • nvidia-cudnn-cu12 (cuDNN-9 runtime), etc.(ONNX Runtime)

As a result:

  • ONNX Runtime now uses its own CUDA and cuDNN DLLs from your Python environment.
  • You no longer depend on system-wide libcublasLt.so.* at all for ONNX Runtime.
  • The libcublasLt.so.11 error disappears, because the wheel is a CUDA-12 build and will look for .so.12 in its own packaged runtime.

Step 4 – Verify ONNX Runtime sees the GPU

In Python:

import onnxruntime as ort

print("ORT version:", ort.__version__)
print("Device:", ort.get_device())
print("Available providers:", ort.get_available_providers())

Expected output:

  • Device: GPU
  • 'CUDAExecutionProvider' appears in get_available_providers().(Odysseys.)

If you only see CPUExecutionProvider, then something is still wrong (e.g. driver too old, environment mismatch).

Step 5 – Wire this into InsightFace

Install InsightFace into the same environment:

pip install insightface

Then in code:

from insightface.app import FaceAnalysis

app = FaceAnalysis(
    name="buffalo_l",
    providers=["CUDAExecutionProvider", "CPUExecutionProvider"],
)

# ctx_id=0 means "use GPU 0"; det_size can be tuned as needed
app.prepare(ctx_id=0, det_size=(640, 640))

Key points:

  • providers=["CUDAExecutionProvider", "CPUExecutionProvider"] tells InsightFace to prefer CUDA but fall back to CPU if something is wrong.
  • With the CUDA-12 ORT GPU wheel properly installed, you should see that the sessions for buffalo_l are created with CUDAExecutionProvider active.(PyPI)

6. Alternative: use your existing system CUDA 12.3 instead of bundled runtime

If you explicitly want to rely on a system-installed CUDA 12.3 toolkit, instead of pip-managed CUDA:

6.1 Confirm what is installed

On Linux:

nvcc --version          # toolkit version, if installed
nvidia-smi              # driver version & CUDA support
ls /usr/local           # check for cuda-12.*, cuda-11.* directories

And check libraries:

ldconfig -p | grep libcublasLt
ldconfig -p | grep libcudnn

Typical modern output for CUDA 12.x:

  • libcublasLt.so.12
  • libcublas.so.12
  • libcudnn.so.9 (for cuDNN 9) or libcudnn.so.8 (for cuDNN 8).(Qualiteg Blog)

You will probably see only .so.12, confirming why .so.11 cannot be found.

6.2 Install a CUDA-12 ONNX Runtime GPU wheel

From the docs and release notes: current PyPI onnxruntime-gpu versions ≥ 1.19 are CUDA-12 + cuDNN-9 by default.(ONNX Runtime)

So, after uninstalling any old packages:

pip uninstall -y onnxruntime onnxruntime-gpu
pip install "onnxruntime-gpu==1.20.0"  # or a recent stable version

(Replace 1.20.0 with the latest stable version if needed.)

Then ensure:

  • Your system has CUDA 12.x, and
  • Your system has cuDNN 9.x libraries (if this wheel depends on cuDNN 9).(ONNX Runtime)

If you have CUDA 12.3 but cuDNN 8, you may see a different error: libcudnn.so.9: cannot open shared object file. That indicates a cuDNN mismatch and must be solved by installing cuDNN 9 or selecting a wheel built against cuDNN 8 instead.(GitHub)

6.3 Make sure the libraries are on the search path

Set LD_LIBRARY_PATH (Linux example):

export CUDA_HOME=/usr/local/cuda-12.3
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH

This ensures that when libonnxruntime_providers_cuda.so is loaded, it can find libcublasLt.so.12 and libcudnn.so.9 (or whichever versions match your ORT build).(Stack Overflow)

Then re-run the Python check from §5.4 to confirm that CUDAExecutionProvider is available.


7. Why you see specifically libcublasLt.so.11

Putting everything together:

  • Your current onnxruntime-gpu build was compiled against CUDA 11.x.
  • CUDA 11.x’s cuBLAS Lt library name is libcublasLt.so.11.
  • Your machine environment exposes CUDA 12.x libraries instead, which are named libcublasLt.so.12.
  • Because there is no libcublasLt.so.11 file anywhere in the search path, the dynamic loader fails and ONNX Runtime logs your error.(ONNX Runtime)

This pattern is repeated in many GitHub issues, StackOverflow questions, and blog posts; the mutual conclusion is always “CUDA / cuDNN version mismatch between ORT-GPU build and system libraries”.


8. Quick checklist for you to follow

  1. Create a fresh virtualenv or conda env.

  2. pip uninstall -y onnxruntime onnxruntime-gpu inside it.

  3. Preferred path:

    • pip install "onnxruntime-gpu[cuda,cudnn]"
    • This gives you a CUDA-12 + cuDNN-9 runtime embedded inside ONNX Runtime.
  4. Verify:

    import onnxruntime as ort
    print(ort.get_device())
    print(ort.get_available_providers())
    

    Expect: GPU and 'CUDAExecutionProvider' present.

  5. Install insightface and use:

    from insightface.app import FaceAnalysis
    app = FaceAnalysis(name="buffalo_l",
                       providers=["CUDAExecutionProvider", "CPUExecutionProvider"])
    app.prepare(ctx_id=0, det_size=(640, 640))
    
  6. If you must use system CUDA 12.3 instead:

    • Confirm CUDA 12.x and cuDNN major version.
    • Install a CUDA-12 onnxruntime-gpu wheel that matches cuDNN.
    • Ensure library paths include the CUDA / cuDNN directories.

9. Curated links for deeper reading

  • ONNX Runtime – CUDA Execution Provider requirements
    Explains the CUDA / cuDNN compatibility matrix and minor version rules.(ONNX Runtime)

  • ONNX Runtime – TensorRT / CUDA notes
    States that from ORT 1.19 CUDA 12 is the default, and from 1.22 only CUDA-12 GPU packages are released.(ONNX Runtime)

  • StackOverflow – “ONNX Runtime Inference using GPU: libcublasLt.so.11 not found”
    Real-world discussion of the same error and how to select the correct onnxruntime-gpu build and set LD_LIBRARY_PATH.(Stack Overflow)

  • Qualiteg blog (JP) – “ONNX RuntimeのCUDAエラー libcublasLt.so.11 を解決する”
    Step-by-step breakdown of this exact error, including environment checks and fixes.(Qualiteg Blog)

  • ONNX Runtime GitHub issue #21684“Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.11…”
    Shows how changing ORT-GPU version exposes cuDNN 8 vs 9 mismatches when using CUDA 12.x.(GitHub)

  • InsightFace PyPI page
    Confirms that you must install onnxruntime-gpu manually for GPU inference and that InsightFace is just a front-end on top of ONNX Runtime.(PyPI)


Short bullet summary

  • The error is caused by ONNX Runtime GPU expecting libcublasLt.so.11 (CUDA 11), while your system only exposes CUDA-12 libraries.

  • buffalo_l is an ONNX model and is not tied to any particular CUDA version; it runs fine on CUDA 12.3 as long as onnxruntime-gpu and CUDA/cuDNN versions match.

  • Easiest, robust fix:

    • Use a clean env.
    • Uninstall any existing ONNX Runtime packages.
    • Install onnxruntime-gpu[cuda,cudnn] so ONNX Runtime brings its own CUDA-12 + cuDNN-9 runtime.
    • Confirm CUDAExecutionProvider is available, then configure InsightFace to use it.
  • Alternative: if you insist on system CUDA 12.3, install a CUDA-12 onnxruntime-gpu wheel and matching cuDNN 9, and ensure CUDA libraries are on the library search path.