Was this problem ever solved? I played around with different settings mentioned in this thread and it changes nothing.
dataset.map stalls out around 80% of the dataset completion and then gets steadily slower and slower until it pretty much stops at 97% complete.
I tried to cut a smaller piece of the dataset to try, but this behavior persists on a smaller set too.
At the end it seems only one process is running and the rest are idle.
After 27 hours I just ctl-C out of it.