Document Object Model (DOM) similarity learning

neo-benjamin · July 3, 2022, 10:45pm

I would like to know is there a model for generating the embedding of Document Object Model (DOM). DOM is a tree and therefore I suppose a model that handle tree would be a good choice.

The downstream task is to learn DOM similarity.

Given two DOM input, I am thinking to generate the two Dom embeddings emb_dom1, emb_dom2 and then I can take the cosine similarity for the similarity matching.

Any feedback is welcome.

CpILL · July 9, 2022, 3:40am

Stanford have a course on Graph Neural Networks. Not sure if there is a framework out there for it (but there probably is).

pallavJha · May 13, 2024, 12:21pm

Hey @neo-benjamin - Which approach did you take to generate the DOM embeddings? I’m working on a problem where I need to find similar pages based on the structure. Looking for a way to pass the structural info while creating the embedding.

Thanks!

gbenson · May 20, 2024, 6:32pm

@pallavJha, I’ve been trying to figure out DOM embeddings this past couple of weeks. I looked to see if anyone’s done anything similar but the only thing I found was a project called webui, which I think is more aimed at understanding mobile user interfaces (but it’s been useful to explore). Anyway, I didn’t find anything so I’ve been working on a custom tokenizer to include DOM info along with the text content, and probably I’ll try and generate embeddings from that once I get it working how I’d like. What I’ve done so far is on my github, currently it takes a Chrome Debug Protocol structure as input but i purposely wrote it so it could also accept HTML via beautifulsoup or similar (just all the data I have is the CDP structure so that’s what I wrote first!)

pranavcm · July 11, 2025, 7:13pm

Hey @gbenson – did you ever figure this out/ implement it (either with CDP or with DOM)? I’d love to use something like that for a project I’m working on right now.

Topic		Replies	Views
Embedding structured data Models	0	391	May 19, 2024
Best Model for Question + Answer Embeddings Intermediate	0	470	March 15, 2024
Any web parser models? Beginners	0	176	April 19, 2024
What model would best fit a structured text generator? Beginners	0	771	April 10, 2022
Low Dim Embeddings from Similarity Transformer Models Beginners	1	643	April 5, 2024

Document Object Model (DOM) similarity learning

Related topics