Deployed HF model from the hub and got an error: 'numpy.ndarray' object has no attribute 'pop'

pavel-nesterov · September 15, 2021, 7:09am

Hi, dear community. I am new member here and I got stuck with the inference with HF model.
Here is what I’m trying to do:

there is a pre-trained HF model deployed as Sagemaker endpoint (code below, #1)
I am trying to access this endpoint from outside Sagemaker - first from Lambda, then from Colab
both cases gave me the same error:

2021-09-15 06:41:14,481 [INFO ] W-distilbert-base-uncased-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - AttributeError: 'numpy.ndarray' object has no attribute 'pop'

Here is the code #1 of endpoint deployment (I deploy it from SageMaker studio):

from sagemaker.huggingface import HuggingFaceModel
import sagemaker 

role = sagemaker.get_execution_role()

# Hub Model configuration. https://huggingface.co/models
hub = {
  #'HF_MODEL_ID':'distilbert-base-uncased-distilled-squad', # model_id from hf.co/models
    'HF_MODEL_ID':'distilbert-base-uncased', # model_id from hf.co/models
    'HF_TASK':'question-answering' # NLP task you want to use for predictions
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   env=hub,
   role=role, # iam role with permissions to create an Endpoint
   transformers_version="4.6", # transformers version used
   pytorch_version="1.7", # pytorch version used
   py_version="py36", # python version of the DLC
)

Code#2 for inference from Colab

!pip install boto3
import os
import io
import boto3
from botocore.config import Config
import json
import csv

# grab environment variables
my_config = Config(
    region_name='eu-central-1'
)
ENDPOINT_NAME = 'huggingface-pytorch-inference-2021-09-15-06-35-36-297'

runtime= boto3.client('runtime.sagemaker', 
                      region_name='eu-central-1', 
                      aws_access_key_id = 'AK************************RW', 
                      aws_secret_access_key= 'dZ*************************************o9I')


payload = {

"inputs": {
    "question": "What is used for inference?",
    "context": "My Name is Philipp and I live in Nuremberg. This model is used with sagemaker for inference."
    }
}
print(payload)

response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,
                                   ContentType='text/csv',
                                   Body=json.dumps(payload))
print(response)
result = json.loads(response['Body'].read().decode())
print(result)

Here is a full log of the invocation from Colab.

This is an experimental beta features, which allows downloading model from the Hugging Face Hub on start up. It loads the model defined in the env var `HF_MODEL_ID`
#015Downloading:   0%|          | 0.00/8.54k [00:00<?, ?B/s]#015Downloading: 100%|██████████| 8.54k/8.54k [00:00<00:00, 7.39MB/s]
#015Downloading:   0%|          | 0.00/483 [00:00<?, ?B/s]#015Downloading: 100%|██████████| 483/483 [00:00<00:00, 498kB/s]
#015Downloading:   0%|          | 0.00/268M [00:00<?, ?B/s]#015Downloading:   3%|▎         | 8.57M/268M [00:00<00:03, 85.7MB/s]#015Downloading:   6%|▋         | 17.4M/268M [00:00<00:02, 87.1MB/s]#015Downloading:  10%|▉         | 26.4M/268M [00:00<00:02, 88.5MB/s]#015Downloading:  13%|█▎        | 35.5M/268M [00:00<00:02, 89.6MB/s]#015Downloading:  17%|█▋        | 44.5M/268M [00:00<00:02, 89.2MB/s]#015Downloading:  20%|█▉        | 53.4M/268M [00:00<00:02, 89.2MB/s]#015Downloading:  23%|██▎       | 62.3M/268M [00:00<00:02, 89.0MB/s]#015Downloading:  27%|██▋       | 71.2M/268M [00:00<00:02, 87.6MB/s]#015Downloading:  30%|██▉       | 80.0M/268M [00:00<00:02, 83.0MB/s]#015Downloading:  33%|███▎      | 89.0M/268M [00:01<00:02, 85.1MB/s]#015Downloading:  36%|███▋      | 97.6M/268M [00:01<00:02, 77.0MB/s]#015Downloading:  39%|███▉      | 105M/268M [00:01<00:02, 70.4MB/s] #015Downloading:  42%|████▏     | 113M/268M [00:01<00:02, 65.5MB/s]#015Downloading:  45%|████▌     | 121M/268M [00:01<00:02, 70.0MB/s]#015Downloading:  48%|████▊     | 130M/268M [00:01<00:01, 75.3MB/s]#015Downloading:  52%|█████▏    | 139M/268M [00:01<00:01, 79.6MB/s]#015Downloading:  55%|█████▌    | 147M/268M [00:01<00:01, 81.1MB/s]#015Downloading:  58%|█████▊    | 156M/268M [00:01<00:01, 78.2MB/s]#015Downloading:  61%|██████    | 164M/268M [00:02<00:01, 78.6MB/s]#015Downloading:  64%|██████▍   | 172M/268M [00:02<00:01, 79.1MB/s]#015Downloading:  68%|██████▊   | 181M/268M [00:02<00:01, 82.7MB/s]#015Downloading:  71%|███████   | 190M/268M [00:02<00:00, 85.2MB/s]#015Downloading:  74%|███████▍  | 199M/268M [00:02<00:00, 87.0MB/s]#015Downloading:  78%|███████▊  | 208M/268M [00:02<00:00, 87.6MB/s]#015Downloading:  81%|████████  | 217M/268M [00:02<00:00, 54.2MB/s]#015Downloading:  84%|████████▍ | 226M/268M [00:02<00:00, 61.7MB/s]#015Downloading:  88%|████████▊ | 235M/268M [00:03<00:00, 68.4MB/s]#015Downloading:  91%|█████████ | 244M/268M [00:03<00:00, 74.2MB/s]#015Downloading:  94%|█████████▍| 253M/268M [00:03<00:00, 78.5MB/s]#015Downloading:  98%|█████████▊| 262M/268M [00:03<00:00, 81.8MB/s]#015Downloading: 100%|██████████| 268M/268M [00:03<00:00, 78.5MB/s]
#015Downloading:   0%|          | 0.00/466k [00:00<?, ?B/s]#015Downloading:  18%|█▊        | 86.0k/466k [00:00<00:00, 484kB/s]#015Downloading:  96%|█████████▌| 446k/466k [00:00<00:00, 1.39MB/s]#015Downloading: 100%|██████████| 466k/466k [00:00<00:00, 1.30MB/s]
#015Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]#015Downloading: 100%|██████████| 28.0/28.0 [00:00<00:00, 16.8kB/s]
#015Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]#015Downloading:  37%|███▋      | 86.0k/232k [00:00<00:00, 484kB/s]#015Downloading: 100%|██████████| 232k/232k [00:00<00:00, 862kB/s] 
WARNING - Overwriting /.sagemaker/mms/models/distilbert-base-uncased ...
2021-09-15 06:40:02,907 [INFO ] main com.amazonaws.ml.mms.ModelServer - 
MMS Home: /opt/conda/lib/python3.6/site-packages
Current directory: /
Temp directory: /home/model-server/tmp
Number of GPUs: 0
Number of CPUs: 1
Max heap size: 3201 M
Python executable: /opt/conda/bin/python3.6
Config file: /etc/sagemaker-mms.properties
Inference address: http://0.0.0.0:8080
Management address: http://0.0.0.0:8080
Model Store: /.sagemaker/mms/models
Initial Models: ALL
Log dir: /logs
Metrics dir: /logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Preload model: false
Prefer direct buffer: false
2021-09-15 06:40:03,058 [WARN ] W-9000-distilbert-base-uncased com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-9000-distilbert-base-uncased
2021-09-15 06:40:03,161 [INFO ] W-9000-distilbert-base-uncased-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - model_service_worker started with args: --sock-type unix --sock-name /home/model-server/tmp/.mms.sock.9000 --handler sagemaker_huggingface_inference_toolkit.handler_service --model-path /.sagemaker/mms/models/distilbert-base-uncased --model-name distilbert-base-uncased --preload-model false --tmp-dir /home/model-server/tmp
2021-09-15 06:40:03,162 [INFO ] W-9000-distilbert-base-uncased-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Listening on port: /home/model-server/tmp/.mms.sock.9000
2021-09-15 06:40:03,162 [INFO ] W-9000-distilbert-base-uncased-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [PID] 38
2021-09-15 06:40:03,162 [INFO ] W-9000-distilbert-base-uncased-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - MMS worker started.
2021-09-15 06:40:03,162 [INFO ] W-9000-distilbert-base-uncased-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Python runtime: 3.6.13
2021-09-15 06:40:03,163 [INFO ] main com.amazonaws.ml.mms.wlm.ModelManager - Model distilbert-base-uncased loaded.
2021-09-15 06:40:03,171 [INFO ] main com.amazonaws.ml.mms.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2021-09-15 06:40:03,188 [INFO ] W-9000-distilbert-base-uncased com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.mms.sock.9000
2021-09-15 06:40:03,278 [INFO ] main com.amazonaws.ml.mms.ModelServer - Inference API bind to: http://0.0.0.0:8080
Model server started.
2021-09-15 06:40:03,281 [INFO ] W-9000-distilbert-base-uncased-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /home/model-server/tmp/.mms.sock.9000.
2021-09-15 06:40:03,283 [WARN ] pool-2-thread-1 com.amazonaws.ml.mms.metrics.MetricCollector - worker pid is not available yet.
2021-09-15 06:40:04,972 [WARN ] W-9000-distilbert-base-uncased-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Some weights of the model checkpoint at /.sagemaker/mms/models/distilbert-base-uncased were not used when initializing DistilBertForQuestionAnswering: ['vocab_projector.bias', 'vocab_layer_norm.weight', 'vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.bias']
2021-09-15 06:40:04,972 [WARN ] W-9000-distilbert-base-uncased-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - - This IS expected if you are initializing DistilBertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
2021-09-15 06:40:04,972 [WARN ] W-9000-distilbert-base-uncased-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - - This IS NOT expected if you are initializing DistilBertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
2021-09-15 06:40:04,973 [WARN ] W-9000-distilbert-base-uncased-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Some weights of DistilBertForQuestionAnswering were not initialized from the model checkpoint at /.sagemaker/mms/models/distilbert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
2021-09-15 06:40:04,973 [WARN ] W-9000-distilbert-base-uncased-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
2021-09-15 06:40:05,199 [INFO ] W-9000-distilbert-base-uncased-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Model distilbert-base-uncased loaded io_fd=da001afffe0afdd4-0000001a-00000000-e2ff1d120c2f0e8e-0bb0052c
2021-09-15 06:40:05,206 [INFO ] W-9000-distilbert-base-uncased com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 1871
2021-09-15 06:40:05,209 [WARN ] W-9000-distilbert-base-uncased com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-distilbert-base-uncased-1
2021-09-15 06:40:11,097 [INFO ] pool-1-thread-3 ACCESS_LOG - /10.32.0.2:37696 "GET /ping HTTP/1.1" 200 41
2021-09-15 06:40:15,795 [INFO ] pool-1-thread-3 ACCESS_LOG - /10.32.0.2:37696 "GET /ping HTTP/1.1" 200 0
2021-09-15 06:40:20,794 [INFO ] pool-1-thread-3 ACCESS_LOG - /10.32.0.2:37696 "GET /ping HTTP/1.1" 200 0
2021-09-15 06:40:25,794 [INFO ] pool-1-thread-3 ACCESS_LOG - /10.32.0.2:37696 "GET /ping HTTP/1.1" 200 0
2021-09-15 06:40:30,794 [INFO ] pool-1-thread-3 ACCESS_LOG - /10.32.0.2:37696 "GET /ping HTTP/1.1" 200 0
2021-09-15 06:40:35,794 [INFO ] pool-1-thread-3 ACCESS_LOG - /10.32.0.2:37696 "GET /ping HTTP/1.1" 200 0
2021-09-15 06:40:40,794 [INFO ] pool-1-thread-3 ACCESS_LOG - /10.32.0.2:37696 "GET /ping HTTP/1.1" 200 0
2021-09-15 06:40:45,794 [INFO ] pool-1-thread-3 ACCESS_LOG - /10.32.0.2:37696 "GET /ping HTTP/1.1" 200 1
2021-09-15 06:40:50,794 [INFO ] pool-1-thread-3 ACCESS_LOG - /10.32.0.2:37696 "GET /ping HTTP/1.1" 200 1
2021-09-15 06:40:55,794 [INFO ] pool-1-thread-3 ACCESS_LOG - /10.32.0.2:37696 "GET /ping HTTP/1.1" 200 1
2021-09-15 06:41:00,794 [INFO ] pool-1-thread-3 ACCESS_LOG - /10.32.0.2:37696 "GET /ping HTTP/1.1" 200 0
2021-09-15 06:41:05,794 [INFO ] pool-1-thread-3 ACCESS_LOG - /10.32.0.2:37696 "GET /ping HTTP/1.1" 200 0
2021-09-15 06:41:10,794 [INFO ] pool-1-thread-3 ACCESS_LOG - /10.32.0.2:37696 "GET /ping HTTP/1.1" 200 1
2021-09-15 06:41:14,478 [WARN ] W-distilbert-base-uncased-1-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - /opt/conda/lib/python3.6/site-packages/sagemaker_inference/decoder.py:58: VisibleDeprecationWarning: Reading unicode strings without specifying the encoding argument is deprecated. Set the encoding, use None for the system default.
2021-09-15 06:41:14,478 [WARN ] W-distilbert-base-uncased-1-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   return np.genfromtxt(stream, dtype=dtype, delimiter=",")
2021-09-15 06:41:14,479 [INFO ] W-distilbert-base-uncased-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Prediction error
2021-09-15 06:41:14,480 [INFO ] W-distilbert-base-uncased-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last):
2021-09-15 06:41:14,480 [INFO ] W-distilbert-base-uncased-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 222, in handle
2021-09-15 06:41:14,480 [INFO ] W-distilbert-base-uncased-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     response = self.transform_fn(self.model, input_data, content_type, accept)
2021-09-15 06:41:14,480 [INFO ] W-9000-distilbert-base-uncased com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 4
2021-09-15 06:41:14,480 [INFO ] W-distilbert-base-uncased-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 181, in transform_fn
2021-09-15 06:41:14,480 [INFO ] W-distilbert-base-uncased-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     predictions = self.predict(processed_data, model)
2021-09-15 06:41:14,481 [INFO ] W-distilbert-base-uncased-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 142, in predict
2021-09-15 06:41:14,481 [INFO ] W-distilbert-base-uncased-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     inputs = data.pop("inputs", data)
2021-09-15 06:41:14,481 [INFO ] W-distilbert-base-uncased-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - AttributeError: 'numpy.ndarray' object has no attribute 'pop'
2021-09-15 06:41:14,481 [INFO ] W-distilbert-base-uncased-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 
2021-09-15 06:41:14,481 [INFO ] W-distilbert-base-uncased-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - During handling of the above exception, another exception occurred:
2021-09-15 06:41:14,481 [INFO ] W-distilbert-base-uncased-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 
2021-09-15 06:41:14,482 [INFO ] W-distilbert-base-uncased-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last):
2021-09-15 06:41:14,482 [INFO ] W-distilbert-base-uncased-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.6/site-packages/mms/service.py", line 108, in predict
2021-09-15 06:41:14,482 [INFO ] W-distilbert-base-uncased-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     ret = self._entry_point(input_batch, self.context)
2021-09-15 06:41:14,482 [INFO ] W-distilbert-base-uncased-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 231, in handle
2021-09-15 06:41:14,482 [INFO ] W-distilbert-base-uncased-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     raise PredictionException(str(e), 400)
2021-09-15 06:41:14,482 [INFO ] W-distilbert-base-uncased-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - mms.service.PredictionException: 'numpy.ndarray' object has no attribute 'pop' : 400

philschmid · September 15, 2021, 7:19am

Hello @pavel-nesterov,

Welcome to the Community .

First of all, I saw in Code#1 that you deployed 'distilbert-base-uncased' with question-answering, which is definitely not recommended since it is not fine-tuned for question answering.

The Error:

2021-09-15 06:41:14,481 [INFO ] W-distilbert-base-uncased-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - AttributeError: 'numpy.ndarray' object has no attribute 'pop'

raises because you are forcing ContentType='text/csv' for a JSON input, that’s not going to work.
Change ContentType in code#2 to application/json and it should work. Additionally instead of using boto3 + with runtime.sagemaker you could use the sagemaker sdk which provides a HuggingFacePredictor to invoke your endpoints.Hugging Face — sagemaker 2.59.1.post0 documentation

from sagemaker.huggingface import HuggingFacePredictor

predictor = HuggingFacePredictor(ENDPOINT_NAME)
response = predictor.predict(payload)
print(response)

pavel-nesterov · September 15, 2021, 7:23am

Thank you, Philipp! @philschmid
I will try it right now!

P.S. And additional thanks for the tutorials you made, they are one of the main sources for me to progress.

pavel-nesterov · September 15, 2021, 8:24am

Thank you!
I implemented all 3 recommendations and it helped in Colab! I am pretty sure I will handle it with Serverless framework and Lambda.

And it looks like I definitely need to go through the Transformer models - Hugging Face Course

The feeling when you get unstuck - priceless!

pavel-nesterov · September 15, 2021, 10:00am

Worked in Colab, but in Lambda I had to avoid using HuggingFacePredictor and switched to boto3. The reason for it is that I tried, but couldn’t do any of the following

add sagemaker_sdk layer to lambda because it is not supported officially
find a public lambda layer which can added via ARN to the lambda
create my own custom layer (too much learning required and I switched to boto3-based call).

Thanks again, @philschmid

philschmid · September 15, 2021, 10:49am

Glad I could help you.

BTW. you said you are using Serverless Framework for deployment? It is pretty to add additional python dependencies with it.

pavel-nesterov · September 15, 2021, 12:00pm

Yes, I do make it with Serverless (I feel uncomfortable without git taking care of all my changes).

Thanks for pointing to this approach. I’ve already had this plugin installed, so I just added to requirements.txt

sagemaker == 2.59.1

It increased my package size from 40mb to 70mb, but so far that’s fine (while it is less than 250mb limit of lambda).

You are helping me the second time within the single day @philschmid

Thank you

Topic		Replies	Views
Inference failed for FLAN-UL2(20B) on SageMaker Amazon SageMaker	6	2161	April 4, 2023
How to make an inference for HuggingFaceModel of type 'image-to-text' Amazon SageMaker	0	505	January 27, 2024
Deploying Open AI's whisper on Sagemaker Amazon SageMaker	54	16186	April 12, 2024
Deploying T5-style models via Sagemaker Endpoint: 'T5LayerFF' object has no attribute 'config' Amazon SageMaker	5	1464	November 7, 2022
Deploying TheBloke/Luna-AI-Llama2-Uncensored-GGML Amazon SageMaker	0	844	September 11, 2023

Deployed HF model from the hub and got an error: 'numpy.ndarray' object has no attribute 'pop'

Related topics