Install custom python libraries in HuggingFace DLC

bluePenguin · August 4, 2022, 12:12pm

Hi,

I have the source code for a custom data cleaning library; the code is private to my company. I usually install it from local source using something ‘pip install ./cleanse’. I know the requirements.txt file can be used to specify package versions for public libraries like pandas and NumPy but how can we install custom packages in the HuggingFace DLC?

philschmid · August 8, 2022, 6:21am

You can put ./cleanse into you source_dir and then either add os.system("pip install ./cleanse") at the top of your training script or add the local path ./cleanse into a requirements.txt in the source_dir.

bluePenguin · August 8, 2022, 1:40pm

Hi, I’m actually using this library to clean in inference. Is there a place I could leverage the inference.py file inside the code/ directory in the model.tar.gz file? Or otherwise can I somehow integrate the into the preprocess script (using SKLearnProcessor)

philschmid · August 8, 2022, 2:03pm

Sorry, i am not sure what you mean with Is there a place I could leverage the inference.py file inside the code/ directory in the model.tar.gz file?

bluePenguin · August 8, 2022, 2:10pm

I’m using an inference.py file right now as per instructions at the bottom of this page to override defaults during model inference. I’m wondering how/if I can use the custom library within this file

philschmid · August 8, 2022, 2:17pm

^like that

bluePenguin · August 8, 2022, 2:24pm

What is source_dir though? Like where is that argument specified

philschmid · August 8, 2022, 2:35pm

How are you deploying your model currently? could you share the code snippet? source_dir refers to the HuggingFaceModel of the sagemaker-sdk.

bluePenguin · August 8, 2022, 2:43pm

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data= model_uri,  # path to your trained sagemaker model
   role=role, # iam role with permissions to create an Endpoint
   transformers_version="4.17", # transformers version used
   pytorch_version="1.10", # pytorch version used
   py_version="py38", 
   env={ 'HF_TASK':'text-classification' }# python version of the DLC
)

philschmid · August 8, 2022, 2:55pm

does the model_uri contain a code/inference.py ? If so you need to put the requirements.txt / ./cleanse into the code/ directory.

Topic		Replies	Views
Inference Toolkit - Init and default template for custom inference Amazon SageMaker	12	2133	October 4, 2021
HuggingFaceModel ignores code directory Amazon SageMaker	2	14	June 17, 2025
Adding missing packages in HF DLC Amazon SageMaker	1	353	September 14, 2023
Loading inference.py separately from model.tar.gz Amazon SageMaker	4	1855	June 5, 2023
Infer on sagemaker with custom pipeline Amazon SageMaker	2	498	September 14, 2023

Install custom python libraries in HuggingFace DLC

Related topics