LLAMA 2 Tokenized Inputs Use Too Much Data

Hi all. I was wondering if anyone has used the Llama2 tokenizer yet. I tokenized a list of around 500,000 strings and it took up over 200GB of data. It seems like way too much, so was wondering if anyone else encountered this.