Is it a good idea to finetune an LLM to predict certain number?

I want a model that predicts the difficulty (i.e., answer rate in a two-digit number) of a reading comprehension test question.
Is it a good idea to finetune an LLM (ex. llama2, mistral or gpt) for this case?
I’m worried because imo LLMs are generally bad at representing numbers in embedding vectors…
Or should I change the last layer of those models?
Any tips would help a lot. Thanks!

Fine-tuning BERT for a regression task: is a description enough to predict a property’s list price? | by Anthony Galtier | ILB Labs publications | Medium

Consider using an Encoder rather than a Generative model to perform regression on text.