How to determine the number of warm-up steps?

Hi everyone!
I’m fine-tuning a GPT-2 model on a dataset with a size of 500,000 steps. What proportion of the training steps would be logical as the start-up point to consider for warm-up steps?