M2 Max GPU utilization steadily dropping while running inference with huggingface distilbert-base-cased

I logged an issue for TF: M2 GPU utilization decays from 50% to 10% in non batched inference for huggingface distilbert-base-cased · Issue #60271 · tensorflow/tensorflow · GitHub

I also posted a “workaround” there.