Text-embeddings-inference docker image fails to run

scottlaser · December 2, 2024, 4:18pm

I am unable to docker run text-embeddings-inference docker images (I have tried several) in my local Docker environment.

Running H-100 GPUs,
Ubuntu host,
recent docker engine install, CUDA 12.4,
verified “LD_LIBRARY_PATH” exists and that it contains the proper directory,
verified directory is on PATH,
checked symbolic link libcuda.so.1 was created and linked to current version libcuda.so.550.127.05.
Symbolic ink and file confirmed to exist in proper directory (/usr/lib/x86_64-linux-gnu).
Able to access the file through the symbolic link with a non-elevated account.
Permissions on libcuda.so.550.127.05 are 644.
Used text-embeddings-inference:89-1.2 and 1.5 (and others).

Error received: “error while loading shared libraries: libcuda.so.1 : cannot open shared object file.”

John6666 · December 3, 2024, 4:37am

This is an error that appears when the CUDA toolkit is not installed or the path is not set up, but I think it’s installed…
Maybe the path reference is not working properly in a specific library.

github.com/abetlen/llama-cpp-python

Docker llama-cpp libcuda.so.1: cannot open shared object file: No such file or directory

opened 04:58PM - 09 Feb 24 UTC

Apotrox

# Prerequisites Please answer the following questions for yourself before sub…mitting an issue. - [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now. - [x] I carefully followed the [README.md](https://github.com/abetlen/llama-cpp-python/blob/main/README.md). - [x] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed). - [x] I reviewed the [Discussions](https://github.com/abetlen/llama-cpp-python/discussions), and have a new bug or useful enhancement to share. # Expected Behavior Expected: Probably loading all necessary files as requested # Current Behavior `File "/home/worker/app/.venv/lib/python3.11/site-packages/llama_cpp/llama_cpp.py", line 76, in _load_shared_library 2024-02-09 17:32:34 raise RuntimeError(f"Failed to load shared library '{_lib_path}': {e}") 2024-02-09 17:32:34 RuntimeError: Failed to load shared library '/home/worker/app/.venv/lib/python3.11/site-packages/llama_cpp/libllama.so': libcuda.so.1: cannot open shared object file: No such file or directory` # Environment and Context I'm trying to set up privategpt in a Docker enviroment. In the Dockerfile, i specifially reinstalled the "newest" llama-cpp-python version, along with the necessary cuda libraries, to enable GPU Support. As this appears to be specifically a llama-cpp-python issue, i'm posting it here (too). * Physical (or virtual) hardware you are using, e.g. for Linux: `$ lscpu` ``` # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 48 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 24 On-line CPU(s) list: 0-23 Vendor ID: AuthenticAMD Model name: AMD Ryzen 9 5900X 12-Core Processor CPU family: 25 Model: 33 Thread(s) per core: 2 Core(s) per socket: 12 Socket(s): 1 Stepping: 2 BogoMIPS: 7386.18 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_g ood nopl tsc_reliable nonstop_tsc cpuid extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid rdseed adx smap clflu shopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr arat npt nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsav e_vmload umip vaes vpclmulqdq rdpid Virtualization features: Virtualization: AMD-V Hypervisor vendor: Microsoft Virtualization type: full Caches (sum of all): L1d: 384 KiB (12 instances) L1i: 384 KiB (12 instances) L2: 6 MiB (12 instances) L3: 32 MiB (1 instance) Vulnerabilities: Gather data sampling: Not affected Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Not affected Retbleed: Not affected Spec rstack overflow: Mitigation; safe RET Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected Srbds: Not affected Tsx async abort: Not affected ``` * Operating System, e.g. for Linux: `$ uname -a` => Linux 1de939a0a313 5.15.133.1-microsoft-standard-WSL2 * SDK version, e.g. for Linux: ``` $ python3 --version => 3.11.6 $ make --version => 4.3 $ g++ --version => 12.2.0 ```

scottlaser · December 3, 2024, 12:33pm

Thank you for your response. I verified the CUDA toolkit is installed and the path works for a non-root account. I have two AI servers from Lambda Labs. The first server is running CUDA compilation tools 12.2.140 and the text-embeddings-inference container starts and runs fine. The second server is running 12.4.131 and fails to start with the error mentioned in the above post. Lambda Labs is also investigating.

GTez · March 4, 2025, 12:29am

I was able to resolve this by using this in my docker compose:

    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids:
                - "0"
              capabilities:
                - gpu

I believe this directly specified to use the NVidia driver

Topic		Replies	Views
Docker text-generation-inference Inference Endpoints on the Hub	0	628	April 9, 2024
Text-generation-inference: "You are using a model of type llama to instantiate a model of type ." Models	5	7649	November 3, 2023
Make Text Embedding Server compatible 🤗Optimum	2	291	August 8, 2024
Is this CUDA memory error on Inference API coming from HuggingFace or Google Collab? Beginners	0	613	July 20, 2021
RuntimeError: Found no NVIDIA driver on your system when running on NVIDIA A10G Large Spaces	3	10931	September 8, 2023

Text-embeddings-inference docker image fails to run

Related topics