Hi Everyone!
I am a Data Scientist working on a real estate search portal in Italy (similar to Zillow). We are now implementing free search. For the demo, the only goal was to take user input and populate a JSON file which would then serve as the search query. For this, I have parallelised several calls to chatGPT, each trained to extract 1 single feature.
It seems to me that for most of the binary features regex would outperform any model. I tried to add a classifier head to an italian bert: dbmdz/bert-base-italian-cased, but these did not perform extremely well. All I’ve found on the internet are classifications concerning sentiment based text, not concerning features. Does anyone have any Idea of what would be the best way to convert a free text search to a JSON like object for a query on a database?
For converting free text search to a JSON object for a database query, regex is very effective for binary features. You might consider a hybrid approach if you’re dealing with more complex features. You could use regex for simple, predictable patterns and a machine-learning model for more nuanced text extraction. Training a custom NER model on your specific dataset might help improve accuracy.
When extracting structured data, I think the best strategy is to start with understanding exactly what kind of data you need and where it’s coming from. This can save a lot of headaches down the line.
When extracting structured data, I think the best strategy is to start with understanding exactly what kind of data you need and where it’s coming from. This can save a lot of headaches down the line. For me, tools like identity validation api have been convenient. They help ensure that the data is accurate and trustworthy, which is super important.
Once you’ve got your tools, set up a straightforward process. Break down the extraction into manageable steps, and test each to ensure it works correctly before moving on. This way, you can catch any issues early and fix them without hassle.