I’m a beginner who discovered Hugging Face a few days ago and I’m really impressed by what we can do here.
I was wondering if it’s possible to replicate the “domain search” feature (like in HuggingChat) for my own custom chatbots, essentially using it as a RAG approach.
Is there a straightforward way to crawl or connect data from a website URL for that purpose? If so, could you please explain how or point me to any relevant tools or examples?
Hello. There are two broad methods. One is to process the results of a normal web search using a programming language such as Python and pass the results to LLM yourself. The other is a method called Function Calling, in which you instruct LLM to execute a search tool and return the results. (There are various names for this method.)
In the case of the former, there are various useful libraries, so you should try searching for them. If you can use the latter, it is usually built into the LLM execution environment, so it is often found somewhere in the documentation.