Passing gradient_checkpointing to a config initialization is deprecated

Not sure why this pretrained model has gradient_checkpointing enabled in its config @patrickvonplaten ? It will make everyone who wants to fine-tune it use gradient checkpointing by default which is not something we want.