Using transformers (BERT, RoBERTa) without embedding layer

You can send “top_k” parameter to “fill-mask” method, to return more/all tokens.
Check here:

If it is still doesn’t fit your use-case, then you have to implement it your self.