Performance with new NVIDIA RTX 30 series

Hi there,

I just got my new RTX 3090 and spent the whole weekend on compiling PyTorch 1.8 (latest master) with the latest CUDA 11.1 (that should support the new 30 series properly).

I did some performance comparisons against a 2080 TI for token classification and question answering and want to share the results :hugs:

For token classification I just measured the iterations per second for fine-tuning multi-lingual BERT on GermEval 2014 dataset. With the 2080 TI, I achieved 4 iterations per second, with new RTX 2090 6.8 iterations per second. Using fp16 the 2080 TI achieved 7.7 iterations per second and 8.4 iterations per second with the RTX 3090 (and “native” support of half precision).

I also measured the time that is needed to fine-tune a model on SQuAD. With the 2080 TI it took 38:57 minutes for one epoch and 22:25 minutes on the RTX 3090.

I’m planing to do more benchmarks with other examples. I don’t have any TITAN RTX card, but if someone from the community could provide some benchmarks (also for other cards!!), it would be awesome :hugs:

5 Likes

I have a Titan RTX card with some spare runtime. I’d be interested in running some SQuAD 2.0 fine-tuning comparisons to your RTX 3090.

My modelcard for 'ahotrod/electra_large_discriminator_squad2_512` includes the fine-tuning script file, Tensorboard loss rate image, nvidia-smi image for memory usage, and other specs for the 4-1/2 hr fine-tuning. See complete details in “List all files in model” of the modelcard.

deepset-ai/haystack ( https://github.com/deepset-ai/haystack ) has convenient reader speed and accuracy benchmarks for benchmarking RTX3090 vs Titan RTX, https://github.com/deepset-ai/haystack/issues/441#issuecomment-701748635 using reader.eval_on_file("dev-v2.0.json")

1 Like

I am not too impressed with the results in the sense that I won’t be upgrading my 2080TI. For those who are looking for a nice DL card and don’t have any card yet, I bet the 3090 is a great choice though. (Although you can find 2080TI second hand for like 500$ which is great value too, but lacks larger VRAM)

@stefan-it If you can provide me with a repo and all the commands needed to run, I can run some benchmarks on DDP V100’s and soon a couple of A100’s, too.

2 Likes

Also pinging @madlag and @mfuntowicz on this

1 Like

Hi @ahotrod, thanks for that hint! I will try to fine-tune the model and report back results here in a few days :slight_smile:

@BramVanroy: for token classification I just use the shell script from here: https://github.com/huggingface/transformers/blob/master/examples/token-classification/run.sh

For SQuAD 1 I used the command specified here:

A100 would be super interesting :hugs: