Some city names include the state/province/country to disambiguate cities with the same name located in different regions or countries. Examples:
Paris, TX
Moscow, ID
Syracuse, NY
Athens, United States
Perth, GB
Waterloo, Canada
Now there are some models capable to extract locations. I tried these (and few others):
Davlan/distilbert-base-multilingual-cased-ner-hrl
dslim/bert-base-NER
FacebookAI/xlm-roberta-large-finetuned-conll03-english
None of them can handle such composite city names.
They return one location for “Los Angeles” and “Frankfurt am Main”, but they return TWO separate locations for cities like Paris, TX.
I experimented with “aggregation_strategy” a bit.
Does not help.
It is possible to tweak it and “Paris, TX” will be one location.
BUT! In this case it returns a string “Bay Area, East Europe, South America” as one location.
So… who knows a pre-trained model supporting composite city names?