Instantiating TransfoXLTokenizer using existing vocab dict

jodiak · December 4, 2020, 5:44am

Hello everyone, I’ve been experimenting with several examples to try and grok how to train a TransformerXL model from scratch for my own text generation use case and was looking for some guidance. I’m currently stuck on how to properly load my existing vocabulary which is a python dictionary saved as a pickle format. Does someone have an example of creating a TransoXLTokenizer using a preexisting vocabulary?

jodiak · January 8, 2021, 4:21am

Found all you have to do is instantiate a TransfoXLTokenizer and pass vocab a file where each “word” in your vocabulary is a line.

Topic		Replies	Views
TransformerXL on Custom Language Beginners	1	253	October 21, 2020
Tokenizer from own vocab 🤗Tokenizers	0	456	July 11, 2022
Trianing a model using predefined vocab Beginners	0	228	August 16, 2022
Load tokenizer from vocab file that's been read into python Beginners	0	732	August 12, 2020
Using a fixed vocabulary? Intermediate	2	930	November 8, 2021

Instantiating TransfoXLTokenizer using existing vocab dict

Related topics