Extracting logits from vision language models at inference time

Hello, is there a simple way to extract the output token(s) logits when running inference on a VLM natively integrated into transformers (e.g. while using pipeline or AutoML classes)? For instance, I would expect to be able to get them via outputs.logits in the forward pass. Would this be doable with currently integrated models such as LLaVA or CogVLM?