Hi,
I have the source code for a custom data cleaning library; the code is private to my company. I usually install it from local source using something ‘pip install ./cleanse’. I know the requirements.txt file can be used to specify package versions for public libraries like pandas and NumPy but how can we install custom packages in the HuggingFace DLC?
You can put ./cleanse
into you source_dir
and then either add os.system("pip install ./cleanse")
at the top of your training script or add the local path ./cleanse
into a requirements.txt
in the source_dir
.
Hi, I’m actually using this library to clean in inference. Is there a place I could leverage the inference.py file inside the code/ directory in the model.tar.gz file? Or otherwise can I somehow integrate the into the preprocess script (using SKLearnProcessor)
Sorry, i am not sure what you mean with Is there a place I could leverage the inference.py file inside the code/ directory in the model.tar.gz file?
I’m using an inference.py file right now as per instructions at the bottom of this page to override defaults during model inference. I’m wondering how/if I can use the custom library within this file
What is source_dir though? Like where is that argument specified
How are you deploying your model currently? could you share the code snippet? source_dir
refers to the HuggingFaceModel
of the sagemaker-sdk
.
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
model_data= model_uri, # path to your trained sagemaker model
role=role, # iam role with permissions to create an Endpoint
transformers_version="4.17", # transformers version used
pytorch_version="1.10", # pytorch version used
py_version="py38",
env={ 'HF_TASK':'text-classification' }# python version of the DLC
)
does the model_uri
contain a code/inference.py
? If so you need to put the requirements.txt / ./cleanse
into the code/
directory.
1 Like