Fine tuning Llama 2 walkthrough missing scripts directory error

jkyle · August 1, 2023, 8:18pm

I’m following the Fine Tuning Llama 2 on Sagemaker walkthrough.

It’s going pretty smooth till it’s time to fit the estimator and I ran into two exceptions.

First is a Create Bucket permission denied error. I added a output_path parameter to the hugging face estimator pointing to my s3 bucket/prefix and it now doesn’t throw that error.

However, I’m not sure if it’s “fixed” since I now get a ValueError for a missing “scripts” directory.

From the tutorial, this is the original estimator

huggingface_estimator = HuggingFace(
    entry_point          = 'run_clm.py',      # train script
    source_dir           = 'scripts',         # directory which includes all the files needed for training
    instance_type        = 'ml.g5.4xlarge',   # instances type used for the training job
    instance_count       = 1,                 # the number of instances used for training
    base_job_name        = job_name,          # the name of the training job
    role                 = role,              # Iam role used in training job to access AWS ressources, e.g. S3
    volume_size          = 300,               # the size of the EBS volume in GB
    transformers_version = '4.28',            # the transformers version used in the training job
    pytorch_version      = '2.0',             # the pytorch_version version used in the training job
    py_version           = 'py310',           # the python version used in the training job
    hyperparameters      =  hyperparameters,  # the hyperparameters passed to the training job
    environment          = { "HUGGINGFACE_HUB_CACHE": "/tmp/.cache" }, # set env variable to cache models in /tmp
)

Updated for the outoutp_path

huggingface_estimator = HuggingFace(
    output_path          = 's3://mybucket/jkyle/sagemaker/output',
    entry_point          = 'run_clm.py',      # train script
    source_dir           = 'scripts',         # directory which includes all the files needed for training
    instance_type        = 'ml.g5.4xlarge',   # instances type used for the training job
    instance_count       = 1,                 # the number of instances used for training
    base_job_name        = job_name,          # the name of the training job
    role                 = role,              # Iam role used in training job to access AWS ressources, e.g. S3
    volume_size          = 300,               # the size of the EBS volume in GB
    transformers_version = '4.28',            # the transformers version used in the training job
    pytorch_version      = '2.0',             # the pytorch_version version used in the training job
    py_version           = 'py310',           # the python version used in the training job
    hyperparameters      =  hyperparameters,  # the hyperparameters passed to the training job
    environment          = { "HUGGINGFACE_HUB_CACHE": "/tmp/.cache" }, # set env variable to cache models in /tmp
)

And the exception

ValueError: No file named "run_clm.py" was found in directory "scripts".

I’m not clear on the root path it’s looking for the scripts path in? Do I need to upload ore create this somewhere?

Cheers & Thanks for an tips!

trajesh · August 3, 2023, 10:32am

@jkyle James, just to check if you were able to get past this ?.
Facing the same issue as well following the same reference
@philschmid

jkyle · August 3, 2023, 8:46pm

Not yet.

But I haven’t tried (and will next)

Create the scripts directory in the same directory as the local notebook I’m running.
See if I can copy a scripts directory over to the session. this is a little opaque to me

frozhen · September 15, 2023, 10:10pm

Did you figure this out yet? I’m facing the same issue too. @philschmid Can you please help with this? Thanks in advance!

frozhen · September 15, 2023, 10:15pm

Ok I think I missed it - in the article it says the file can be found here: https://github.com/philschmid/sagemaker-huggingface-llama-2-samples/blob/master/training/scripts/run_clm.py

alisha23 · October 3, 2023, 7:42pm

any solution for the same? Facing same error

Topic		Replies	Views
Sagemaker Huggingface PermissionError: [Errno 13] Permission denied: 'git' Beginners	0	837	October 12, 2023
HuggingFaceModel ignores code directory Amazon SageMaker	2	12	June 17, 2025
ValueError: Source directory does not exist in the repo. Training causal lm in sagemaker Amazon SageMaker	8	1606	July 26, 2021
Package errors running huggingface estimator on sagemaker Beginners	1	937	February 9, 2023
Gated Model access error during training on SageMaker Beginners	1	781	September 4, 2023

Fine tuning Llama 2 walkthrough missing scripts directory error

Related topics