I am unable to docker run text-embeddings-inference docker images (I have tried several) in my local Docker environment.
Running H-100 GPUs,
Ubuntu host,
recent docker engine install, CUDA 12.4,
verified “LD_LIBRARY_PATH” exists and that it contains the proper directory,
verified directory is on PATH,
checked symbolic link libcuda.so.1 was created and linked to current version libcuda.so.550.127.05.
Symbolic ink and file confirmed to exist in proper directory (/usr/lib/x86_64-linux-gnu).
Able to access the file through the symbolic link with a non-elevated account.
Permissions on libcuda.so.550.127.05 are 644.
Used text-embeddings-inference:89-1.2 and 1.5 (and others).
Error received: “error while loading shared libraries: libcuda.so.1 : cannot open shared object file.”
1 Like
This is an error that appears when the CUDA toolkit is not installed or the path is not set up, but I think it’s installed…
Maybe the path reference is not working properly in a specific library.
opened 04:58PM - 09 Feb 24 UTC
# Prerequisites
Please answer the following questions for yourself before sub… mitting an issue.
- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x] I carefully followed the [README.md](https://github.com/abetlen/llama-cpp-python/blob/main/README.md).
- [x] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/abetlen/llama-cpp-python/discussions), and have a new bug or useful enhancement to share.
# Expected Behavior
Expected: Probably loading all necessary files as requested
# Current Behavior
`File "/home/worker/app/.venv/lib/python3.11/site-packages/llama_cpp/llama_cpp.py", line 76, in _load_shared_library
2024-02-09 17:32:34 raise RuntimeError(f"Failed to load shared library '{_lib_path}': {e}")
2024-02-09 17:32:34 RuntimeError: Failed to load shared library '/home/worker/app/.venv/lib/python3.11/site-packages/llama_cpp/libllama.so': libcuda.so.1: cannot open shared object file: No such file or directory`
# Environment and Context
I'm trying to set up privategpt in a Docker enviroment. In the Dockerfile, i specifially reinstalled the "newest" llama-cpp-python version, along with the necessary cuda libraries, to enable GPU Support. As this appears to be specifically a llama-cpp-python issue, i'm posting it here (too).
* Physical (or virtual) hardware you are using, e.g. for Linux:
`$ lscpu`
```
# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 24
On-line CPU(s) list: 0-23
Vendor ID: AuthenticAMD
Model name: AMD Ryzen 9 5900X 12-Core Processor
CPU family: 25
Model: 33
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 1
Stepping: 2
BogoMIPS: 7386.18
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_g
ood nopl tsc_reliable nonstop_tsc cpuid extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy
svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid rdseed adx smap clflu
shopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr arat npt nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsav
e_vmload umip vaes vpclmulqdq rdpid
Virtualization features:
Virtualization: AMD-V
Hypervisor vendor: Microsoft
Virtualization type: full
Caches (sum of all):
L1d: 384 KiB (12 instances)
L1i: 384 KiB (12 instances)
L2: 6 MiB (12 instances)
L3: 32 MiB (1 instance)
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Retbleed: Not affected
Spec rstack overflow: Mitigation; safe RET
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Srbds: Not affected
Tsx async abort: Not affected
```
* Operating System, e.g. for Linux:
`$ uname -a` => Linux 1de939a0a313 5.15.133.1-microsoft-standard-WSL2
* SDK version, e.g. for Linux:
```
$ python3 --version => 3.11.6
$ make --version => 4.3
$ g++ --version => 12.2.0
```
Thank you for your response. I verified the CUDA toolkit is installed and the path works for a non-root account. I have two AI servers from Lambda Labs. The first server is running CUDA compilation tools 12.2.140 and the text-embeddings-inference container starts and runs fine. The second server is running 12.4.131 and fails to start with the error mentioned in the above post. Lambda Labs is also investigating.
1 Like