Embedding structured data

gbenson · May 19, 2024, 12:35pm

Hi all, I’m trying to figure out how understanding and generating source code and/or structured data is handled by LLMs, but I’m struggling to find any starting points. I have questions like, do people treat code and data as separate modalities to natural language, or are they handled as one modality with some kind of translation step? I’ve a fair idea how multi modal models work generally but what I’m looking at doesn’t seem so clear cut as the difference between natural language and images, for example, because natural language and source code both “are” text, in some sense.

The code/data I’m specifically interested in is HTML as rendered in the browser. I want to get a model to understand the structure of what the browser is displaying so as to gain an understanding of the application that’s generating that structure. I have a tokenizer that can extract/encode meaning from what the browser is displaying, but I don’t know how to continue with that… I understand replacing/extending output layers to fine-tune a model, but as far as I can tell different tokenizer would be replacing/extending input layers and I don’t see how you’d do that… it’d be like having a preprocessor converting my tokenization into a the model’s tokenization, but that seems like the wrong approach.

Does anybody know any research or other good starting points I can look at to unstick myself?

Topic		Replies	Views
Meaning Machine — A Visual Explorer for How LLMs Simulate Understanding Show and Tell	0	13	April 25, 2025
Tips on structured data translation Intermediate	0	652	November 18, 2021
Understanding regarding "Question Answering model using open-source LLM" Beginners	0	1021	May 3, 2023
Experience with and extending LLM for software engineering Intermediate	4	481	August 15, 2024
AnyModal – A Framework for Multimodal LLMs Show and Tell	0	245	November 17, 2024

Embedding structured data

Related topics