TransfoXLLMHeadModel - Trying to create tensor with negative dimension -199500

Hi all,

I am trying to create a TransfoXLLMHeadModel using a custom vocabulary, but I keep coming across the same issue:

RuntimeError Traceback (most recent call last)
in
----> 1 model = TransfoXLModel(config=cfg)
~/.local/lib/python3.6/site-packages/transformers/modeling_transfo_xl.py in init(self, config)
736
737 self.word_emb = AdaptiveEmbedding(
→ 738 config.vocab_size, config.d_embed, config.d_model, config.cutoffs, div_val=config.div_val
739 )
740
~/.local/lib/python3.6/site-packages/transformers/modeling_transfo_xl.py in init(self, n_token, d_embed, d_proj, cutoffs, div_val, sample_softmax)
421 l_idx, r_idx = self.cutoff_ends[i], self.cutoff_ends[i + 1]
422 d_emb_i = d_embed // (div_val ** i)
→ 423 self.emb_layers.append(nn.Embedding(r_idx - l_idx, d_emb_i))
424 self.emb_projs.append(nn.Parameter(torch.FloatTensor(d_proj, d_emb_i)))
425
~/.local/lib/python3.6/site-packages/torch/nn/modules/sparse.py in init(self, num_embeddings, embedding_dim, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse, _weight)
107 self.scale_grad_by_freq = scale_grad_by_freq
108 if _weight is None:
→ 109 self.weight = Parameter(torch.Tensor(num_embeddings, embedding_dim))
110 self.reset_parameters()
111 else:
RuntimeError: Trying to create tensor with negative dimension -199500: [-199500, 8]

The code I am running is the following:

tokenizer = TransfoXLTokenizer(vocab_file=‘/path/to/vocab.txt’)

Note: tokenizer.vocab_size == 500

cfg = TransfoXLConfig(
vocab_size=tokenizer.vocab_size,
d_model=512,
d_embed=512,
n_head=8,
d_head=64,
n_layer=12,
d_inner=2048
)

model = TransfoXLLMHeadModel(config=cfg)

Does anyone have any insight as to what may be going wrong? Any help is greatly appreciated!

Thank you,

Victor

The solution is to change the cutoff for the adaptive embeddings, as mentioned by user TevenLeScao in this GitHub issue: RuntimeError: Trying to create tensor with negative dimension · Issue #8098 · huggingface/transformers · GitHub

To summarize, you need to set:

cfg = TransfoXLConfig(cutoffs=[0, x])

Where 0 < x < vocab_size

If you don’t do this, the model will try to generate an embedding size of vocab_size - cutoffs[-1], which for a vocab_size < default vocab size, will be negative and throw an error.