Tokenization: different results when tokenizing in one pass vs sample-by-sample

Anyone got an update on this?