I’m new to all of this so I’m not sure exactly why since i’d have to find the source code that is invoked when you call inference. But I did get it to work myself.
I noticed 2 things that led me to get it to work:
- I noticed that in the NLP examples they’d pass in the tokenizer to the trainer and then just call save once.
- I kept getting errors when calling my end point that
ResNetForImageClassification
wasn’t a supported model for the inference endpoint.
I had a working theory that maybe it wasn’t saving the tokenizer
/processing_class
out to the deployed model, and since i was also not getting a model from the hub via HF_MODEL_ID
I figured I may be doing my saving wrong.
I inspected the tar file artifact that got saved to S3 and saw that I had no preprocessor_config.json
which I needed since running locally I saw this fail if it was not present when loading a model I saved.
model = AutoModelForImageClassification.from_pretrained(
root_dir / "output"
)
processor = AutoImageProcessor.from_pretrained(
root_dir / "output"
)
so long preamble later I set up the following estimator to fine tune a res net.
in the train-resnet.py
file I had the same ish things that I saw on multiple tutorials
# Training function
def train(args: ProgramArgs):
# Load datasets
train_dataset = load_from_disk(args.training_dir)
test_dataset = load_from_disk(args.test_dir)
num_labels = len(train_dataset.features["label"].names)
# Load model
model = AutoModelForImageClassification.from_pretrained(
args.model_name,
num_labels=num_labels,
ignore_mismatched_sizes=True,
)
# Load processor
processor = AutoImageProcessor.from_pretrained(args.model_name)
# define training args
training_args = TrainingArguments(
output_dir=args.model_dir,
num_train_epochs=args.epochs,
per_device_train_batch_size=args.train_batch_size,
per_device_eval_batch_size=args.eval_batch_size,
evaluation_strategy="epoch",
save_strategy="epoch",
learning_rate=float(args.learning_rate),
weight_decay=0.01,
logging_dir=f"{args.model_dir}/logs",
load_best_model_at_end=True,
metric_for_best_model=metric_name,
)
# create Trainer instance
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=test_dataset,
data_collator=default_data_collator,
)
# train model
trainer.train()
# Saves the model to s3
trainer.save_model(args.model_dir)
# Saves processor config to s3
processor.save_pretrained(args.model_dir)
Setting `HF_TASK = ‘image-classification’ does not seem to work here, it might be a bug, or it might be me not fully understanding the library (I have 1 week of experience with it at this point)
huggingface_estimator = HuggingFace(
entry_point="train-resnet.py",
source_dir="./scripts",
instance_type="ml.p3.2xlarge",
instance_count=1,
role=role,
# you have to coordinate this locally if you're testing this.
transformers_version="4.28",
pytorch_version="2.0",
py_version="py310",
# set this to we can just use the hugginface implementation of inference, else we build our own.
hyperparameters={
"epochs": 50,
"train_batch_size": 32,
"eval_batch_size": 64,
"model_name": "microsoft/resnet-50",
"learning_rate": 1e-4,
},
)
Once I trained this estimator and called fit. I found the model folder in s3 it saved to
# starting the train job with our uploaded datasets as input
huggingface_estimator.fit({
'train': training_input_path,
'test': test_input_path
})
root_bucket = "<your bucket>"
model_bucket = "<your model>"
model_s3_path = f"{root_bucket}/{model_bucket}/output/model.tar.gz"
huggingface_pretrained = HuggingFaceModel(
model_data=model_s3_path,
role=role,
transformers_version="4.28",
pytorch_version="2.0",
py_version="py310",
env={
"HF_TASK": "image-classification",
},
)
image_serializer = DataSerializer(content_type='image/x-image')
predictor = huggingface_pretrained.deploy(
initial_instance_count=1,
instance_type="ml.g4dn.xlarge",
serializer=image_serializer,
)
image_path = "path/to/image"
with open(image_path, "rb") as data_file:
image_data = data_file.read()
res = predictor.predict(image_data)
print(res)
This seemed to work! though you still wont get the test inference view in sage maker to accept image/jpeg. it still thinks it’s textual for me. so you have to test via code