Fine tuning token classifier with very long sequences

I have a set of custom NER training data with some very long text sequences. Each training example is a raw string with a list of entities represented as character offsets like [{“start”: 5, “end”: 20, “label”: “effective_date”}, …]. Most of the tutorials I see for training token classifiers seem to assume the training data is in CONLL format where the data is already pre-split into words with IOB tags. I think it would be pretty easy to convert my data into this format, but I don’t know what to do about sequence length. It’s important the sequences not be truncated because some entity types only show up later in the sequences. Does anyone have some suggestions on the best way to handle this?