How to specify S3 bucket training debug output

Hello,

I’m working on a project where we’re using AWS SageMaker to run some jobs and we specified the default_bucket parameter in the SageMaker Session, but when we run the code, a new s3 bucket with the name sagemaker-{region}-{aws-account-id} is created and there’s folders created with each run with the prefix huggingface-pytorch-training-. In those folders, there’s three folders: debug-output/ , profiler-output/ , and source.

When taking a look at the SageMaker Session documentation, it says that the bucket is created if the default_bucket parameter is not specified:

default_bucket (str) – The default Amazon S3 bucket to be used by this session. This will be created the next time an Amazon S3 bucket is needed (by calling default_bucket()). If not provided, a default bucket will be created based on the following format: “sagemaker-{region}-{aws-account-id}”. Example: “sagemaker-my-custom-bucket”.

It seems like the predictions and checkpoint data is correctly saved to the S3 Bucket that is specified in the SageMaker Session but there’s still the lingering items. Can someone help me understand how we can set it so that this sagemaker-{region}-{aws-account-id} bucket doesn’t get created and instead the outputs created in the huggingface-pytorch-training- folder get directed to an S3 bucket that we specify?

Here’s a snippet of our code:

    session = sagemaker.Session(default_bucket=DESTINATION_S3_BUCKET)
    role = sagemaker.get_execution_role(sagemaker_session=session)

    hf_estimator = HuggingFace(
        role=role,
        entry_point=TRAINER_FILE,
        instance_type=instance_type,
        instance_count=INSTANCE_COUNT,
        transformers_version=TRANSFORMERS_VERSION,
        pytorch_version=PYTORCH_VERSION,
        py_version=PY_VERSION,
        checkpoint_s3_uri=CHECKPOINT_S3_URI,
        hyperparameters=HYPERPARAMETERS,
    )

and the TRAINER_FILE has the following:

    training_args = TrainingArguments(
        output_dir=args["OUTPUT_DIR"],
        optim=args["OPTIMIZER"],
        per_device_train_batch_size=16,
        num_train_epochs=args["TRAIN_EPOCHS"],
        learning_rate=args["LEARNING_RATE"],
        weight_decay=args["WEIGHT_DECAY"],
        warmup_ratio=args["WARMUP_RATIO"],
        per_device_eval_batch_size=16,
        save_strategy="epoch",
        logging_strategy="epoch",
        remove_unused_columns=False,
        fp16=True,
        push_to_hub=False,
    )

    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_data,
        eval_dataset=validation_data,
        tokenizer=tokenizer,
        compute_metrics=compute_metrics,
    )