This might help: Generate raw word embeddings using transformer models like BERT for downstream process - #2 by BramVanroy