What model should I use for English to JSON

For a project I’m working on I was wanting to create a machine learning model where I can describe a game engine asset in English and get a JSON script out that I can then use an intermediate script to create the asset within the game engine.
The assets I’m working with can easily be described using JSON or some other equivalent, and the JSON for such assets would all be fairly similar and share similar qualities so I figured it might be a problem I could tackle with machine learning.
My problem is I don’t quite know where to start. Would this kind of a problem need a GPT or some kind of Transformer model to work properly? And if so, where are some possible resources I could go to in order to start making my own model and learn how to train it properly?