Parsing maritime location ranges

Stromgren · May 20, 2025, 9:26am

Hi folks

I’m attempting to train a model to parse maritime location ranges. These are strings that can be resolved into a geographical area or a list of shipping ports.

An example could be AG NSOBI JUBAIL EXCL I+I
This translates to Arabian Gulf, not south of, but including Jubail exclusive Iran and Iraq

The ranges often include tons of abbreviations, acronyms, spelling mistakes and just different ways of representing the same thing, but essentially it’s a list of locations and operators (INCL, EXCL, NSOBI, etc.)

My goal with this model is to translate the ranges into a known and structured format. So the above would translate to [ARABIAN GULF|LOC] [NSOBI|OPR] [JUBAIL|LOC] [EXCL|OPR] [IRAN|LOC] [IRAQ|LOC]. This I could further process with a deterministic program, by looking up the location and applying the operators.

I’ve created a few hundred training examples and retrained the T5-small model, which initially looked good, but it’s like its struggling to learn any generalisations. If I take the above example (which is a well known range) and just add something simple like “INCL FUJAIRAH” at the end, it’ll fail because it has never seen that sequence before (I assume).

I’m looking for input on how to solve this problem. Other approaches/models I can try out?

I’ll add some more examples to explain the challenge:

USG IF MISS RIVER NNOBIBR

[US GULF|LOCATION] [IF|CONDITION] [MISSISSIPPI RIVER|LOCATION] [NNOBI|OPERATOR] [BATON ROUGE|LOCATION]

This is an interesting example because NNOBI (Not north of, but including) and BR (Baton Rouge) was concatenated. This is not a spelling mistake, shipping traders just use the range enough that they know what it means.
EUROMED NEOBIG EXCL Y,FY,AL BUT INCL R+O

[EUROMED|LOCATION] [NEOBI|OPERATOR] [GREECE|LOCATION] [EXCL|OPERATOR] [YUGOSLAVIA|LOCATION] [FORMER YUGOSLAVIA|LOCATION] [ALBANIA|LOCATION] [INCL|OPERATOR] [RIJEKA|LOCATION] [OMISALJ|LOCATION]

Lots of stuff here that can only be understood in the context of the full sequence.

Thanks!

John6666 · May 20, 2025, 1:55pm

This code has no practical meaning, but I feel that using two types of models in series like this would be easier to implement.

from transformers import pipeline
input_text = "My name is Sylvain and I work at Hugging Face in Brooklyn."
translator = pipeline("translation_en_to_es", model="Helsinki-NLP/opus-mt-en-es")
classifier = pipeline("token-classification", model="huggingface-course/bert-finetuned-ner", aggregation_strategy="simple")
print(classifier(input_text))
# [{'entity_group': 'PER', 'score': 0.9988506, 'word': 'Sylvain', 'start': 11, 'end': 18}, {'entity_group': 'ORG', 'score': 0.96476245, 'word': 'Hugging Face', 'start': 33, 'end': 45}, {'entity_group': 
# 'LOC', 'score': 0.9986118, 'word': 'Brooklyn', 'start': 49, 'end': 57}]
print(translator(input_text))
# [{'translation_text': 'Me llamo Sylvain y trabajo en Hugging Face en Brooklyn.'}]
print(classifier(translator(input_text)[0]["translation_text"]))
# [{'entity_group': 'MISC', 'score': 0.4424469, 'word': '##yl', 'start': 10, 'end': 12}, {'entity_group': 'LOC', 'score': 0.99720335, 'word': 'Brooklyn', 'start': 46, 'end': 54}]

Topic		Replies	Views
NLP: Named Entity Recognition: Location: looking for a model supporting composite city names Models	0	222	May 27, 2024
Pre-Trained Models with "Time" and "Distance" labels for NER Beginners	0	344	July 16, 2021
Addition of a new language (Chadian Arabic ‘shu’) to the NLP, LLM models Beginners	0	10	January 11, 2025
MTL model for find entity names and make corrections 🤗Transformers	0	9	February 12, 2025
Using the jpelhaw / t5-word-sense-disambiguation model Beginners	2	685	April 14, 2022

Parsing maritime location ranges

Related topics