Differences between transformers GPT2 and megatron-lm?

lucasjin · March 10, 2022, 3:01am

Comes from this issue. @valhalla sorry for @ you here again.

My question is:

they are all GPT2, but is there any differences? including arch, ops etc.
megatron-lm using FusedLayerNorm, but I don’t see such op inside transformers GPT2, is there equal interms of final predictions?
what’s the strength of megatron-lm and what’s the weakness compares with transformers?

thank u if anyone could give me a hand.

Topic		Replies	Views
Fused Kernel Operations Intermediate	0	619	July 26, 2022
Time and memory taken to fine-tune GPT-2 Models	0	771	February 22, 2021
Are there any higher version transformers compatible with transformers==3.0.2 🤗Transformers	0	207	September 4, 2022
Training speed vs Megatron 🤗Transformers	0	224	November 29, 2023
Transformers changelog 🤗Transformers	2	1383	August 3, 2022