Hi,
I am using transformers pipeline
for token-classification
.
tokenizer = AutoTokenizer.from_pretrained("./modelfiles")
model = AutoModelForTokenClassification.from_pretrained("./modelfiles")
nlp = pipeline("token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
ner_results = nlp(text)
The problem here is that on the first call to the function that implements the above lines the memory is released, then on the second call onwards the memory is not released as can be seen from the screenshot :
The first peak is me calling the function and on return it frees up the memory but then on second call onwards it does not… this eventually leads to a crash.
A memory profiler suggests the line model = AutoModelForTokenClassification.from_pretrained("./modelfiles")
to be the problem. As can be seen from the screenshot below:
I have tried setting model = None
before the return
statement and also called gc.collect()
but the problem persists.
Can someone please help me with this as this is always leading to a crash of the application.
Thank you