How to run trainer.py with megatron_lm_plugin

src/transformers/trainer.py, as I understand it, supports deepspeed but not megatron-lm. Is it right?
When I tried to make it support megatron-lm, I encountered some problems:

  1. When and where to load megatron-lm ckpt?
    A. function _inner_training_loop
    B. prepare_model in accelerate/utils/megatron_lm.py
    C. other good way

  2. Only the loss of last_pp_rank is valid in Megatron-LM, while the Trainer takes the mean of ranks. When and where to deal with loss of megatron?
    A. function _nested_gather in trainer.py
    B. GitHub - NVIDIA/Megatron-LM: Ongoing research training transformer models at scale
    C. other good ways

  3. Trainer calls accelerate.prepare for model and dataloader respectively, which will initialize megatron-lm repeatedly. How to avoid this problem
    A. Modify initialization in Megatron-LM/megatron/global_vars.py
    B. Modify function initialize in accelerate/utils/megatron_lm.py
    C. other good ways

  4. In order to support the given dataset with different seq-length for each batch, when and where to padding it to meet requirements of megatron. (All batches have same seq-length)
    A. Accelerate
    B. Megatron-LM
    C. other good ways