Seeking Guidance on Extracting Bidding Data from Procurement Documents

demebriel · April 23, 2024, 9:21am

Hello Hugging Face Community,

I’m working on a project that involves extracting company names and their bid amounts from diverse public procurement announcements, with the aim to organize this information into a structured format like this: {'announcement_id': xxx, bids: [{'company': 'A', 'bid_value': 500}, {'company': 'B', 'bid_value': 600}]}.

Each announcement can include none or multiple companies and their bids. This structured information would very much help me with my research project. I’ve manually prepared a training dataset from a selection of these announcements. However, I’m quite new to NLP and I am unsure about the best methodologies to employ.

Could you please point me to a good direction and offer advice or resources on the following points?

Model Choice: Should I use Named Entity Recognition (NER), a Generative Model, or another approach for this task?
Data Preparation: What are best practices for preparing and structuring my data to handle multiple entries per document?
Model Training: How can I effectively fine-tune a model to recognize and extract this specific type of data? (Using my manually extracted sample of companies and bids from announcements).

I would greatly appreciate any insights or examples of similar projects, as I’m not sure which aspects to focus on more deeply.

Thank you!

Topic		Replies	Views
Suggestions about how to apply Named Entity Recognition or Related Beginners	3	36	August 31, 2024
Extracting information from bills, tax statements, etc: What ML model to use? Research	3	3199	August 28, 2024
Custom Entity Extraction from text Beginners	2	2623	November 5, 2023
Seeking Advice on Named Entity Recognition with AI Beginners	6	656	February 5, 2025
Identifying sections of a text document Models	1	945	November 22, 2021

Seeking Guidance on Extracting Bidding Data from Procurement Documents

Related topics