Dynamic Programming for Byte-level BPE

  1. Could anyone explain the rationale behind equation (1) in Neural Machine Translation with Byte-Level Subwords ?
  2. Besides, what does it exactly mean by The design of UTF-8 encoding ensures the uniqueness of this recovery process: for a character UTF-8 encoded with multiple bytes, its trailing bytes will not make a valid UTF-8 encoded character ?
  3. How exactly are the hexadecimal digits being derived in Figure 1 ?