Hi,
Firstly, Apologies if that is not the right section in the forum… (quite new here)
Now, I am working in a new project, well in a new idea to automate a current very manual process:
Lots of emails come in which are manually processed by “a human” to extract data from them which are then copied in a tabulate software (excel) or similar.
I can clearly identify the attributes/fields I want to extract from the text/emails.
I have too the types of “text/emails”
The “problem” is to be able to extract that information accurately.
Email Example:
"Hello my friends.
we have two guys arriving tomorrow 23/01/2024 around 1pm from Madrid, flight ABC123
Another one people leaving on sunday same week around 2am to Instanbul, flight CBA321
Name of the arrving ones: Jose Mateo Feliz y Ana Triste del Carmen.
Name of the leaving Matia Nodoyuna
Regards"
And basically the output required should be a JSON with a fix and depict attribute list filled (or left empty if not found).
I have been playing with Mistral and Llama, they are ok, a little slow (different Q tested and B), but I believe my scope is quite small so with the right “small” model and some training (I have thousand of emails examples), I could get a much faster and accurate model…
Any thought?
Thanks in advance!