I’m looking for some help in choosing which zero-shot classification model is most appropriate for my use-case. My team has a handful of files that we need to process into a database each day. Each file’s field names need to conform to the database table specifications, however, the files often come in with slightly varying naming conventions (i.e. the field “Address” will show up as “Customer Address” in the file and “Division” will show up as “Branch”).
My thought was to use a zero-shot classification model to predict which incorrect field names (“Customer Address” and “Branch”) align to specific missing field names (“Address” and “Division”). I’m considering adding an additional layer to the model trained on past labeled data to make it a bit more robust (potentially adding in field values in addition to field names).
I’ve tested this idea a bit with valhalla/distilbart-mnli-12-1 · Hugging Face and had decent success, but I’m a bit unclear on how to determine if this is the best model to use vs. another zero-shot model.
Any help is greatly appreciated!