Install custom python libraries in HuggingFace DLC

Hi,

I have the source code for a custom data cleaning library; the code is private to my company. I usually install it from local source using something ‘pip install ./cleanse’. I know the requirements.txt file can be used to specify package versions for public libraries like pandas and NumPy but how can we install custom packages in the HuggingFace DLC?

You can put ./cleanse into you source_dir and then either add os.system("pip install ./cleanse") at the top of your training script or add the local path ./cleanse into a requirements.txt in the source_dir.

Hi, I’m actually using this library to clean in inference. Is there a place I could leverage the inference.py file inside the code/ directory in the model.tar.gz file? Or otherwise can I somehow integrate the into the preprocess script (using SKLearnProcessor)

Sorry, i am not sure what you mean with Is there a place I could leverage the inference.py file inside the code/ directory in the model.tar.gz file?

I’m using an inference.py file right now as per instructions at the bottom of this page to override defaults during model inference. I’m wondering how/if I can use the custom library within this file

^like that

What is source_dir though? Like where is that argument specified

How are you deploying your model currently? could you share the code snippet? source_dir refers to the HuggingFaceModel of the sagemaker-sdk.

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data= model_uri,  # path to your trained sagemaker model
   role=role, # iam role with permissions to create an Endpoint
   transformers_version="4.17", # transformers version used
   pytorch_version="1.10", # pytorch version used
   py_version="py38", 
   env={ 'HF_TASK':'text-classification' }# python version of the DLC
)

does the model_uri contain a code/inference.py ? If so you need to put the requirements.txt / ./cleanse into the code/ directory.

1 Like