If anyone else runs into this issue: it turned out to be a BIOS level change that was needed in order to fix the communication overhead. Specifically changing the link speed on the PCI ports from Gen 1 to Gen 4 resulted in seeing speedups using multiple GPUs for fine-tuning!