Memory leaks when training Gemma or Phi 3 and 3.5 tokenizer

I have a problem when training a new tokenizer for Gemma 2 2B or Phi 3 and 3.5 models using the following code:

def corpus_gen(dataset, batch_size=300, n=300_000):
    current = []
    tot = 0
    for ex in dataset:
        current.append(ex['txt'])
        tot += 1
        if tot == n: break
        if len(current) == batch_size:
            yield current
            current = []
    if current:
        yield current

def train_tokenizer():
    dataset = load_dataset(
        "json", 
        split="train",
        streaming=True,
        data_files=[
            "../serlama/tokenizer/paragraphs_tokenizer.jsonl",
            "../serlama/tokenizer/pdrs_tokenizer.jsonl",
            "../serlama/tokenizer/macocu_tokenizer.jsonl",
    ])

    existing_tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it") 
    
    new_tokenizer = existing_tokenizer.train_new_from_iterator(
        corpus_gen(dataset), 
        vocab_size=30000, 
        min_frequency=3
    )
    new_tokenizer.save_pretrained("sr_tokenizer")

train_tokenizer()

After n= 100 000 (examples) my RAM steadily increases in blocks of few gigabytes and i cannot train the tokenizer.

When i try the same code with Llama 3.1 tokenizer everything is ok and the RAM does not increase.
My transformers version is 4.44.0
Why is that?
What is the problem with Gemma 2 2B and Phi3 tokenizers. Do they have a memory leak problems?