I’m running on a g5.24xlarge instance on EC2.
I’m using the transformers library and trying to fine-tune mpt-7b-instruct.
The training script starts up, downloads the model, loads my dataset, and then errors out in “building trainer” with:
Found cached dataset json (/root/.cache/huggingface/datasets/json/default-ef4f1a35606f162a/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Building trainer...
ERROR:composer.cli.launcher:Rank 1 crashed with exit code -7.
There’s no stack trace or anything, everything looks kosher until this error code. My google-fu hasn’t managed to find any indication of what error -7 might mean.
The stderr for that rank shows noting that I would think shows the error:
/root/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-instruct/a858cfabdc6bf69c03ce63236a5e877517bb957c/attention.py:153: UserWarning: While `attn_impl: triton` can be faster than `attn_impl: flash` it uses more memory. When training larger models this can trigger alloc retries which hurts performance. If encountered, we recommend using `attn_impl: flash` if your model does not use `alibi` or `prefix_lm`.
warnings.warn('While `attn_impl: triton` can be faster than `attn_impl: flash` ' + 'it uses more memory. When training larger models this can trigger ' + 'alloc retries which hurts performance. If encountered, we recommend ' + 'using `attn_impl: flash` if your model does not use `alibi` or `prefix_lm`.')
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 50%|█████ | 1/2 [00:06<00:06, 6.96s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:09<00:00, 4.35s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:09<00:00, 4.74s/it]
Using pad_token, but it is not set yet.
Found cached dataset json (/root/.cache/huggingface/datasets/json/default-ef4f1a35606f162a/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Map: 0%| | 0/2610 [00:00<?, ? examples/s]
Map: 4%|▍ | 112/2610 [00:00<00:02, 1101.55 examples/s]
Map: 9%|▊ | 227/2610 [00:00<00:02, 1126.25 examples/s]
Map: 15%|█▍ | 382/2610 [00:00<00:02, 1065.24 examples/s]
Map: 20%|██ | 533/2610 [00:00<00:02, 1035.03 examples/s]
Map: 25%|██▍ | 641/2610 [00:00<00:01, 1045.58 examples/s]
Map: 29%|██▉ | 751/2610 [00:00<00:01, 1057.60 examples/s]
Map: 33%|███▎ | 871/2610 [00:00<00:01, 1098.15 examples/s]
Map: 38%|███▊ | 1000/2610 [00:01<00:01, 888.27 examples/s]
Map: 44%|████▍ | 1144/2610 [00:01<00:01, 909.71 examples/s]
Map: 48%|████▊ | 1244/2610 [00:01<00:01, 928.00 examples/s]
Map: 53%|█████▎ | 1379/2610 [00:01<00:01, 910.39 examples/s]
Map: 57%|█████▋ | 1479/2610 [00:01<00:01, 928.15 examples/s]
Map: 60%|██████ | 1576/2610 [00:01<00:01, 938.17 examples/s]
Map: 64%|██████▍ | 1676/2610 [00:01<00:00, 953.67 examples/s]
Map: 68%|██████▊ | 1773/2610 [00:01<00:00, 954.62 examples/s]
Map: 73%|███████▎ | 1915/2610 [00:01<00:00, 950.31 examples/s]
Map: 78%|███████▊ | 2045/2610 [00:02<00:00, 827.57 examples/s]
Map: 82%|████████▏ | 2135/2610 [00:02<00:00, 839.42 examples/s]
Map: 87%|████████▋ | 2264/2610 [00:02<00:00, 843.83 examples/s]
Map: 90%|█████████ | 2352/2610 [00:02<00:00, 849.49 examples/s]
Map: 93%|█████████▎| 2440/2610 [00:02<00:00, 855.17 examples/s]
Map: 97%|█████████▋| 2544/2610 [00:02<00:00, 899.43 examples/s]
Found cached dataset json (/root/.cache/huggingface/datasets/json/default-ef4f1a35606f162a/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Map: 0%| | 0/315 [00:00<?, ? examples/s]
Map: 35%|███▌ | 111/315 [00:00<00:00, 1088.88 examples/s]
Map: 86%|████████▋ | 272/315 [00:00<00:00, 1069.73 examples/s]
----------End global rank 1 STDERR----------
I’m running the training script through the mosaicml llm-foundry composer wrapper, in case that matters.