I have fine-tuned Florence-2 on Google Colab for an object detection task and saved the model weights in Colab. To optimize the process, I used PEFT (Parameter-Efficient Fine-Tuning) and mixed precision, which helped reduce storage usage and accelerate training.
Here is the code I used to load my fine-tuned model:
import torch
from transformers import AutoModelForCausalLM, AutoProcessor
from peft import PeftModel, PeftConfig
def load_finetuned_model(checkpoint_path, base_model_name="microsoft/Florence-2-base-ft"):
# Load the base model first
base_model = AutoModelForCausalLM.from_pretrained(
base_model_name,
trust_remote_code=True
)
# Load the PEFT configuration
peft_config = PeftConfig.from_pretrained(checkpoint_path)
# Load the LoRA weights and merge them with the base model
model = PeftModel.from_pretrained(
base_model,
checkpoint_path,
is_trainable=False # Set to True if you want to continue training
)
# Load the processor
processor = AutoProcessor.from_pretrained(
checkpoint_path,
trust_remote_code=True
)
return model, processor
# Usage
CHECKPOINT_PATH = "./model_checkpoints/epoch_99"
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Load the model and processor
model, processor = load_finetuned_model(CHECKPOINT_PATH)
model = model.to(DEVICE)
When I run this code in Colab’s T4 environment, the model loads and works perfectly. However, when I try to load the model on my local system, I encounter the following error:
How can I load the model on my local device and perform inference?