Use finetuned model for feature extraction

I am fine-tuning BLIP2 for captionig images of the H&M dataset. I fine-tuned my model with the peft library using the following line of code to save the model:

from peft import LoraConfig, get_peft_model
from transformers import Blip2ForConditionalGeneration

config = LoraConfig(
        use_rslora=True,
        r=r, # default 8
        lora_alpha=lora_alpha, # default 8
        lora_dropout=lora_dropout, # default 0
        bias=bias, # default none
        target_modules=target_modules
)

checkpoint = "Salesforce/blip2-opt-2.7b"
model = Blip2ForConditionalGeneration.from_pretrained(checkpoint)
model = get_peft_model(model, config)

# train model

model.save_pretrained("best_model.pt")

which saves the models adapter_config.jsonand adapter_model.safetensors. I load the model via

from peft import PeftModel
model = Blip2ForConditionalGeneration.from_pretrained(checkpoint)
peft_model = PeftModel.from_pretrained(model, "best_model.pt")

I want to use the model I trained for extracting word and image embeddings. I did not find a function that does this for the PeftModel class. Therefore I used the pipeline that currently looks like this (based on this documentation

from transformers import AutoProcessor
processor = AutoProcessor.from_pretrained(checkpoint)
extractor = pipeline(model=peft_model.base_model.model.vision_model, task="image-feature-extraction", tokenizer=processor.tokenizer, image_processor=processor, device=0)
result = extractor(test_ds[0]["image"], return_tensors=True)
result.shape  # This is a tensor of shape [1, sequence_length, hidden_dimension] representing the input string.

My question: Is there a way I can verify that the vision model is fine-tuned? Or is the PeftModel Wrapper needed to use the fine-tuned weights?

When I print my model is see lora layers but I am not sure what they mean.

OPTForCausalLM(
  (model): OPTModel(
    (decoder): OPTDecoder(
      (embed_tokens): Embedding(50272, 2560, padding_idx=1)
      (embed_positions): OPTLearnedPositionalEmbedding(2050, 2560)
      (final_layer_norm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
      (layers): ModuleList(
        (0-31): 32 x OPTDecoderLayer(
          (self_attn): OPTAttention(
            (k_proj): lora.Linear(
              (base_layer): Linear(in_features=2560, out_features=2560, bias=True)
              (lora_dropout): ModuleDict(
                (default): Dropout(p=0.1, inplace=False)
              )
              (lora_A): ModuleDict(
                (default): Linear(in_features=2560, out_features=32, bias=False)
              )
              (lora_B): ModuleDict(
                (default): Linear(in_features=32, out_features=2560, bias=False)
              )
              (lora_embedding_A): ParameterDict()
              (lora_embedding_B): ParameterDict()
            )
            (v_proj): lora.Linear(
              (base_layer): Linear(in_features=2560, out_features=2560, bias=True)
...
    )
    (lora_embedding_A): ParameterDict()
    (lora_embedding_B): ParameterDict()
  )
)