Performance with new NVIDIA RTX 30 series

stefan-it · October 4, 2020, 10:27pm

Hi there,

I just got my new RTX 3090 and spent the whole weekend on compiling PyTorch 1.8 (latest master) with the latest CUDA 11.1 (that should support the new 30 series properly).

I did some performance comparisons against a 2080 TI for token classification and question answering and want to share the results

For token classification I just measured the iterations per second for fine-tuning multi-lingual BERT on GermEval 2014 dataset. With the 2080 TI, I achieved 4 iterations per second, with new RTX 2090 6.8 iterations per second. Using fp16 the 2080 TI achieved 7.7 iterations per second and 8.4 iterations per second with the RTX 3090 (and “native” support of half precision).

I also measured the time that is needed to fine-tune a model on SQuAD. With the 2080 TI it took 38:57 minutes for one epoch and 22:25 minutes on the RTX 3090.

I’m planing to do more benchmarks with other examples. I don’t have any TITAN RTX card, but if someone from the community could provide some benchmarks (also for other cards!!), it would be awesome

ahotrod · October 5, 2020, 6:21pm

I have a Titan RTX card with some spare runtime. I’d be interested in running some SQuAD 2.0 fine-tuning comparisons to your RTX 3090.

My modelcard for 'ahotrod/electra_large_discriminator_squad2_512` includes the fine-tuning script file, Tensorboard loss rate image, nvidia-smi image for memory usage, and other specs for the 4-1/2 hr fine-tuning. See complete details in “List all files in model” of the modelcard.

deepset-ai/haystack ( https://github.com/deepset-ai/haystack ) has convenient reader speed and accuracy benchmarks for benchmarking RTX3090 vs Titan RTX, https://github.com/deepset-ai/haystack/issues/441#issuecomment-701748635 using reader.eval_on_file("dev-v2.0.json")

BramVanroy · October 5, 2020, 7:21pm

I am not too impressed with the results in the sense that I won’t be upgrading my 2080TI. For those who are looking for a nice DL card and don’t have any card yet, I bet the 3090 is a great choice though. (Although you can find 2080TI second hand for like 500$ which is great value too, but lacks larger VRAM)

@stefan-it If you can provide me with a repo and all the commands needed to run, I can run some benchmarks on DDP V100’s and soon a couple of A100’s, too.

julien-c · October 7, 2020, 9:44am

Also pinging @madlag and @mfuntowicz on this

stefan-it · October 7, 2020, 9:48am

Hi @ahotrod, thanks for that hint! I will try to fine-tune the model and report back results here in a few days

@BramVanroy: for token classification I just use the shell script from here: https://github.com/huggingface/transformers/blob/master/examples/token-classification/run.sh

For SQuAD 1 I used the command specified here:

https://github.com/huggingface/transformers/tree/master/examples/question-answering#squad

A100 would be super interesting

Topic		Replies	Views
Baffling performance issue on most NVidia GPUs with simple transformers + pytorch code Intermediate	5	4510	April 9, 2024
RTX 6000 Ada slower then 3090 🤗Transformers	0	604	April 25, 2023
More GPUs = lower performance? Beginners	1	521	December 31, 2020
Dual GPU setup not yield Beginners	0	186	February 20, 2024
[Help] GPU with query answering 🤗Transformers	0	328	November 25, 2020

Performance with new NVIDIA RTX 30 series

Related topics