Are Word Embeddings by BERT generated for long sequences better than ones generated for short sequences?

I would like to work with token embeddings of words. Does longer sentential contexts (like paragraphs of 500 tokens long) improve word embeddings compared to the ones generated for shorter sequences?