QLoRA trained LLaMA2 13B deployment error on Sagemaker using text generation inference image

Following example below, I’m trying to build 0.9.3 in the hopes that it would solve my issue with deploying the model with only .safetensor files.

How did you manage to build the docker image @malterei? I’ve tried build in on my M1 Macbook (which fails since it looks like the dockerfile has a hardcoded exit 1 when arch is arm64), as well as several instance types on ec2. All of them eventually gets stuck at this stage because it has consume all of the instance’s memory:

 => [vllm-builder 3/3] RUN make build-vllm                                             69.5s
 => => # python3.9/site-packages/torch/include/TH -I/opt/conda/lib/python3.9/site-packages/t
 => => # orch/include/THC -I/opt/conda/include -I/opt/conda/include/python3.9 -c -c /usr/src
 => => # /vllm/csrc/cache.cpp -o /usr/src/vllm/build/temp.linux-x86_64-cpython-39/csrc/cache
 => => # .o -g -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -DTORCH_API_INCLUDE_EXTENSION_H '-D
 => => # PYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_A
 => => # BI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=vllm_cache_ops -D_GLIBCXX_USE_CXX11_ABI=0
 => CACHED [planner 7/7] RUN cargo chef prepare --recipe-path recipe.json               0.0s
 => CACHED [builder  2/10] COPY --from=planner /usr/src/recipe.json recipe.json         0.0s
 => CACHED [builder  3/10] RUN cargo chef cook --release --recipe-path recipe.json      0.0s
 => CACHED [builder  4/10] COPY Cargo.toml Cargo.toml                                   0.0s
 => CACHED [builder  5/10] COPY rust-toolchain.toml rust-toolchain.toml                 0.0s
 => CACHED [builder  6/10] COPY proto proto                                             0.0s
 => CACHED [builder  7/10] COPY benchmark benchmark                                     0.0s
 => CACHED [builder  8/10] COPY router router                                           0.0s
 => CACHED [builder  9/10] COPY launcher launcher                                       0.0s
 => CACHED [builder 10/10] RUN cargo build --release                                    0.0s

I’ve tried building 0.9.3 and 0.9.2, both are resulting in some error when it hits build-vllm. Am I missing something?