Thanks for your reply! It is super helpful
It is great to know that by just running python {myscript.py} the class will use model parallelism.
A follow-up question from me is, how is the Trainer’s model parallelism differ from Deepspeed and FSDP? Is there any documentation that I can read into to gain more knowledge of what is happening at the backend?
Thanks a lot!