I have a Hugging Face tokenizer with three files: tokenizer.json, tokenizer_config.json, and vocab.txt. However, according to the documentation, the Marian tokenizer requires files in the SentencePiece format (.model and .vocab files). Is there a way to construct a Marian tokenizer without these specific file formats?
I have already tried converting the tokenizer into a .model file and then constructing the Marian tokenizer, but it raised new issues. Are there any alternative approaches or workarounds to use the existing tokenizer files with the Marian tokenizer?