CPU usage during batched training

I noticed that during training the CPU only uses 1 thread. Since my batches are quite small and I’m swapping often I’m wondering if parallelization is possible and if it actually would increase training speed?