Instantiating TransfoXLTokenizer using existing vocab dict

Hello everyone, I’ve been experimenting with several examples to try and grok how to train a TransformerXL model from scratch for my own text generation use case and was looking for some guidance. I’m currently stuck on how to properly load my existing vocabulary which is a python dictionary saved as a pickle format. Does someone have an example of creating a TransoXLTokenizer using a preexisting vocabulary?

Found all you have to do is instantiate a TransfoXLTokenizer and pass vocab a file where each “word” in your vocabulary is a line.

1 Like