BERT for Word Segmentation

Hey there!

I’m currently comparing different ways of word segmentation.
I was wondering if I could simply fine-tune a pre-trained BERT and add like a classification layer on top, so that giving an expression it’s decomposed into its basewords, by labeling each char with b beginning, m middle, e end.
thesunflower123 ->[‘the’, ‘##sun’, ‘##flower’, ‘##12’, ‘##3’] → bmebmmmmmmmebme

I found some papers about Chinese word segmentation and tried to adapt some tutorials. But I’m not quite sure with which pre-trained model I should start or how to train it properly (on sentences?).

Would be thankful for any tips!