Skip to content

Commit 1d8e124

Browse files
CI Fix GPTQmodel install in Docker build (#3188)
While building the GPU Docker image, there is an error because GPTQModel cannot be installed successfully. This is caused by the --no-build-isolation flag, which is no longer required according to the GPTQModel README. The next issue was conflicting CUDA versions with EETQ, which I fixed by bumping the image. Finally, TE wouldn't build, which required setting some env vars. The docker build now finishes, as can be seen in the CI.
1 parent 15af197 commit 1d8e124

1 file changed

Lines changed: 7 additions & 5 deletions

File tree

docker/peft-gpu/Dockerfile

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ RUN chsh -s /bin/bash
2626
SHELL ["/bin/bash", "-c"]
2727

2828
# Stage 2
29-
FROM nvidia/cuda:12.8.1-cudnn-devel-ubuntu22.04 AS build-image
29+
FROM nvidia/cuda:13.2.1-cudnn-devel-ubuntu24.04 AS build-image
3030
COPY --from=compile-image /opt/conda /opt/conda
3131
ENV PATH=/opt/conda/bin:$PATH
3232

@@ -48,14 +48,16 @@ RUN conda run -n peft pip install --no-cache-dir bitsandbytes optimum
4848
# to have compute hardware available we use the information from the CI runner (which hosts
4949
# a NVIDIA L4). So we fix the compute capability to 8.9. In the future we might extend this
5050
# to a list of compute capabilities (separated by ;).
51-
RUN CUDA_ARCH_LIST=8.9 conda run -n peft pip install --no-build-isolation gptqmodel
51+
RUN CUDA_ARCH_LIST=8.9 conda run -n peft pip install gptqmodel
5252

5353
RUN \
5454
# Add eetq for quantization testing; needs to run without build isolation since the setup
5555
# script directly imports torch from the environment which would fail with isolation.
56-
conda run -n peft pip install --no-build-isolation git+https://github.com/NetEase-FuXi/EETQ.git
56+
# Ninja should speed up build time.
57+
conda run -n peft pip install ninja && conda run -n peft pip install --no-build-isolation git+https://github.com/NetEase-FuXi/EETQ.git
5758

58-
RUN \
59+
RUN NVTE_BUILD_USE_NVIDIA_WHEELS=1 \
60+
CPATH="/usr/local/cuda/include:${CPATH}" \
5961
conda run -n peft pip install --no-build-isolation "transformer_engine[pytorch]"
6062

6163
# Activate the conda env and install transformers + accelerate from source
@@ -64,7 +66,7 @@ RUN conda run -n peft pip install -U --no-cache-dir \
6466
"soundfile>=0.12.1" \
6567
scipy \
6668
torchao \
67-
fbgemm-gpu-genai>=1.2.0 \
69+
"fbgemm-gpu-genai>=1.2.0" \
6870
git+https://github.com/huggingface/transformers \
6971
git+https://github.com/huggingface/accelerate \
7072
peft[test]@git+https://github.com/huggingface/peft \

0 commit comments

Comments
 (0)