I’m working on a Reformer model for story generation. I’m looking to create a GPT-2 sized model with longer attention. I’m wondering about parameters for the model configuration as the defaults seem to come from the reformer-crime-and-punishment character-based model.
Could someone give recommendations on the config parameters for a model that size?
I’m using a sentencepiece tokenizer with 16k vocab size.