What is the input vector size for a BERT and Transformer-XL?

Does anyone know what size vectors the BERT and Transformer-XL models take and output?

For example, I know that bert-large is 24-layer, 1024-hidden, 16-heads per block, 340M parameters. (bert-base is 12 heads per block) does that mean it takes a vector size of [24,1024,16]? or am I miss understanding?

Any help is much appreciated

No the inputs are usually a tensor of ints of size batch_size x sequence_length (with integers between 0 and the vocabulary size of the model -1). You can find all expected shapes and output shapes in the documentation, for instance here for BERT.