How to get metrics on unrecognized vocabulary?

I have a hunch that the tokenizer isn’t recognizing pieces of language that we’re feeding it. I’d love to get data on this while training, ideally in something like a keras metric or something visualized into tensorboard.

Is there a way to get the data for # of UNK tokens seen from a run? Or should I just run this manually as a different step?