This forum post is about implementational details pertaining to conversational user interfaces (CUI) for interactive fiction (IF) and other videogames and pertaining to bidirectional synchronizations between game engines and large language models (LLMs).
How can LLMs be utilized as text parsers for existing or new works of IF?
One approach involves mapping spoken or written commands to semantic frames with slots that could be filled by nouns or by embedding vectors which represent those nouns. Perhaps the zero vector could be utilized to signify an empty slot, null or undefined.
Consider that commands like “take lamp” and “pick up the bronze lamp” could both utilize the typed semantic frame for “taking” (https://framenet2.icsi.berkeley.edu/fnReports/data/frame/Taking.xml).
A command like “take it” or “pick it up” could be interpreted using by LLMs using dialogue context after a command like “inspect lamp”.
Disjunction support for semantic frames’ slots could be useful for reporting multiple candidate nouns. A NLU component might want to output that for “pick it up” the “lamp” is 90% probably the resolution of the pronoun and “treasure chest” 10%. With disjunctive and potentially probabilistic outputs, CUI for IF or other videogames could ask players whether they meant “the lamp” or “the treasure chest” in a previous command.
Envisioned here are bidirectional synchronizations between game engines and LLMs. In these regards, let us consider that game engines could manage and maintain dynamic documents, transcripts, and logs and that these could be components of larger prompts to LLMs.
Consider, for example, an animate creature arriving on screen and that a player is desired to be able to use a CUI to refer to it. How did the LLM know that the creature, e.g., an “orc”, was on screen, that it had entered the dialogue context?
By managing dynamic documents, transcripts, or logs, game engines could provide synchronized contexts as components of prompts to LLMs.
This would be towards providing an illusion that the CUI AI also sees or understands the contexts of IF or other videogames.
Next, that creature, e.g., an “orc”, might enter view and then exit view. How would an LLM interpret a delayed command from the player to then respond that that creature was no longer in view? This suggests features of a dynamic transcript or log.
That is, a fuller illusion would be one that the AI sees or understands the present and recent past contexts of IF and other videogames.
Game engines, e.g., Unity and Unreal, could eventually come to support interoperation with LLMs’ dialogue contexts via features for the maintenance of dynamic documents, transcripts, or logs. These engines would then be of general use for creating CUI-enhanced IF and other videogames.
Also possible are uses of multimodal LLMs.
Instead of having to transmit the entirety of dynamic documents, transcripts, logs, or prompts for each spoken or written command to be interpreted by the LLM CUI, it is possible that “deltas” or “diffs” could be transmitted to synchronize between client-side and server-side copies of larger prompts or portions thereof.
Thank you. I hope that I expressed these ideas clearly. I look forward to discussing these ideas with you. Is anyone else thinking about or working on these or similar challenges?