Inference just halts, no error, how to troubleshoot

Maxilirator · January 30, 2024, 10:07am

Hello, I have this problem I just don’t know how to deal with it.

I have a fine tuned model for LayoutLM Token Classification. When I try to do prediction with model(input_ids=t_input_ids, bbox=t_bbox, attention_mask=t_attention_mask) the program just hangs. No error, no cpu or gpu usage.

I just don’t know how I should start to troubleshot this? Any suggestions?

Maxilirator · February 6, 2024, 11:57am

Still trying to fix this problem, the model works fine in on my computer but when i run it in my docker container it just halts. No resources used and no error. The input is the same in both cases.

Problem is I don’t know how to troubleshoot this since there is no error.

dblakely · February 6, 2024, 2:40pm

It might be an underlying cuda installation error. Do commands like these work?

nvidia-smi
nvcc --version
python
>>> import torch
>>> t = torch.tensor([1,2,3]).to("cuda")

Maxilirator · February 7, 2024, 10:52am

Thanks for answering!

Im running on the cpu on my develop machine. So i don’t have nvidia-smi or nvcc installed.

But:

>>> import torch
>>> t = torch.tensor([1,2,3]).to("cpu")
>>> t
tensor([1, 2, 3])

Works fine…

my Dockerfile is:

FROM huggingface/transformers-pytorch-gpu
RUN apt-get update && apt-get install -y \
    python3 \
    python3-pip
WORKDIR /code
COPY ./requirements.txt ./
RUN python3 -m pip install --upgrade pip
RUN pip install --no-cache-dir -r requirements.txt
COPY ./ai .ai

Maxilirator · February 7, 2024, 12:03pm

I did try a basic transformer example and it works great inside the container.

Maxilirator · February 12, 2024, 1:53pm

I’m still working on this, after running debugpy on the docker container the program stops after this line:

torch/nn/functional.py:2233
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)

The debugger stops and no error message is produced, even when trying to step into the line.

Anyone has ideas?

dblakely · February 12, 2024, 6:00pm

Ok in that case it seems like an out of bounds issue with your tokens. One of these two might be causing it:

You added special tokens to the tokenizer but didn’t resize the model’s embedding. So there’s a token with an ID greater than the size of the embedding layer causing an index error
Maybe you’re masking parts of your input_ids with -100 and those are being fed through the embedding layer (negative indexes can’t be passed through the embedding layer)

Maxilirator · February 13, 2024, 9:10am

Thanks, will check this out.

Topic		Replies	Views
Hanging on prediction Beginners	0	740	May 13, 2022
Different Inference Speed for same size models Models	0	389	August 29, 2021
My model is doesnt seem to load in Inference API Beginners	0	486	July 24, 2022
Inference API Widget wont stop loading for my private model Community Calls	0	266	December 6, 2023
Inference API time out problem...need help Beginners	3	339	February 28, 2024

Inference just halts, no error, how to troubleshoot

Related topics