Loading models sometimes maxes DISK%, then crashes

Got an issue where quite often, somewhere in the chain my disk goes 100% read (500mb/s for 10-20m) then crash. I’ve put loggers everywhere to see what’s causing it, and the last logger is usually after loading the summarization model (code here, model used). It could be anything; that’s the most common model used on the service. All I know is it’s transformers, as it’s always that file/module that triggers it.

My models are all persisted, so it’s not re-downloading. Dev’ing Docker on Windows (WSL2 with nvidia-docker / dev-channel). I know that’s the smoking gun, but it happens on my Ubuntu server-server too. pytorch=1.6.0 cuda=10.1 cudnn=7 transformers=3.3.1 python=3.8 (github/lefnire/dockerfiles, forum not letting me post >2 links). I saw github/huggingface/transformers/issues/5001 which had me wondering if it’s a pytorch<->cuda<->transformers version bad-match (the ticket’s very old; cuda 9.2, etc). But is there a recommended/common version-combo of Pytorch, CUDA, Python for transformers?

Could it be something with the .lock files?

I looked into Huggface Dockerfiles, and we’re using the same setup except Python version (theirs Ubuntu 18.04 default Python 3.6). They also install mkl, an Intel optimizer, but not sure if used anywhere; I installed it too check, no cigar. Blast, it’s nearly every time I load facebook/bart-cnn-large.

It was RAM. I had too much going on at once (was loading too many models concurrently, with already taxed RAM) and it was hitting SWAP.

BTW, turns out WSL2 auto-allocates 8GB RAM. Bump it up here.