HuggingChat has access to web search, which is great.
However, (the way it parses webpages isn’t great). AFAIU, it looks only at <p>
tags.
This causes the model to fail at some tasks, because of the parser’s output. Does anyone have an idea of a better way to parse webpages? Any ideas how OpenAI is doing it?
If we just include additional tags, e.g. <span>, <div>, <li>
… we can end up with a loooot of text.
Here’s an example:
The price of the item is $59.90.
- ChatGPT’s response - it gets the price correctly:
The price of the VISCOSE BLEND KNIT POLO SHIRT on the provided link is $59.90 USD1.
Is there anything else you would like to know?
- HuggingChat’s response - it fails because it can’t extract the price as it’s inside
<span>
Note that the model ends up hallucinating, none of the links have $29.90 price.