Converting web scrapped data to a semistructured json payload

I am trying to scrap multiple universities sites for scholarships , and I want to feed it to an llm , hoping that it converts it into a consistent json format , for eg i scrap a site and feed it to llm it generates a payload such as
{scholarship name: "Presidential", gpa: "3.0", sat : "1550"} by scouring through the scrapped webpage.
it should work for at least most of the sites.

How would I go about doing this what models work best for this , I tried open ai’s gpt but it is limited in size of tokens it allows , when fine tuning done. plus when few shot learning done the example tokens alone is way too many tokens.

I am working on a similar problem. Did you happen to get the solution for this?