I’m tuning a mythomax 13B model with a dataset of 250 Questions and answers on the L40S. It ran for 3 1/2 hours before I killed it. Is this normal?
ChatGPT estimated 10 minutes.
(from autotrain-advanced) (4.67.1)
Requirement already satisfied: werkzeug==3.1.3 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (3.1.3)
Requirement already satisfied: xgboost==2.1.3 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (2.1.3)
Requirement already satisfied: huggingface-hub==0.27.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.27.0)
Requirement already satisfied: requests==2.32.3 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (2.32.3)
Requirement already satisfied: einops==0.8.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.8.0)
Requirement already satisfied: packaging==24.2 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (24.2)
Requirement already satisfied: cryptography==44.0.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (44.0.0)
Requirement already satisfied: nvitop==1.3.2 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (1.3.2)
Requirement already satisfied: tensorboard==2.18.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (2.18.0)
Requirement already satisfied: peft==0.14.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.14.0)
Requirement already satisfied: trl==0.13.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.13.0)
Requirement already satisfied: tiktoken==0.8.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.8.0)
Requirement already satisfied: transformers==4.48.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (4.48.0)
Requirement already satisfied: accelerate==1.2.1 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (1.2.1)
Requirement already satisfied: rouge-score==0.1.2 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.1.2)
Requirement already satisfied: py7zr==0.22.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.22.0)
Requirement already satisfied: fastapi==0.115.6 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.115.6)
Requirement already satisfied: uvicorn==0.34.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.34.0)
Requirement already satisfied: python-multipart==0.0.20 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.0.20)
Requirement already satisfied: pydantic==2.10.4 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (2.10.4)
Requirement already satisfied: matplotlib>=2.1.0 in ./env/lib/python3.10/site-packages (from pycocotools==2.0.8->autotrain-advanced) (3.10.0)
Requirement already satisfied: annotated-types>=0.6.0 in ./env/lib/python3.10/site-packages (from pydantic==2.10.4->autotrain-advanced) (0.7.0)
Requirement already satisfied: pydantic-core==2.27.2 in ./env/lib/python3.10/site-packages (from pydantic==2.10.4->autotrain-advanced) (2.27.2)
Requirement already satisfied: charset-normalizer<4,>=2 in ./env/lib/python3.10/site-packages (from requests==2.32.3->autotrain-advanced) (3.3.2)
Requirement already satisfied: urllib3<3,>=1.21.1 in ./env/lib/python3.10/site-packages (from requests==2.32.3->autotrain-advanced) (2.2.3)
Requirement already satisfied: absl-py in ./env/lib/python3.10/site-packages (from rouge-score==0.1.2->autotrain-advanced) (2.1.0)
Requirement already satisfied: six>=1.14.0 in ./env/lib/python3.10/site-packages (from rouge-score==0.1.2->autotrain-advanced) (1.17.0)
Requirement already satisfied: threadpoolctl>=3.1.0 in ./env/lib/python3.10/site-packages (from scikit-learn==1.6.0->autotrain-advanced) (3.5.0)
Requirement already satisfied: grpcio>=1.48.2 in ./env/lib/python3.10/site-packages (from tensorboard==2.18.0->autotrain-advanced) (1.69.0)
Requirement already satisfied: markdown>=2.6.8 in ./env/lib/python3.10/site-packages (from tensorboard==2.18.0->autotrain-advanced) (3.7)
Requirement already satisfied: protobuf!=4.24.0,>=3.19.6 in ./env/lib/python3.10/site-packages (from tensorboard==2.18.0->autotrain-advanced) (5.29.3)
Requirement already satisfied: setuptools>=41.0.0 in ./env/lib/python3.10/site-packages (from tensorboard==2.18.0->autotrain-advanced) (75.1.0)
Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in ./env/lib/python3.10/site-packages (from tensorboard==2.18.0->autotrain-advanced) (0.7.2)
Requirement already satisfied: torchvision in ./env/lib/python3.10/site-packages (from timm==1.0.12->autotrain-advanced) (0.19.0)
Requirement already satisfied: lightning-utilities>=0.8.0 in ./env/lib/python3.10/site-packages (from torchmetrics==1.6.0->autotrain-advanced) (0.11.9)
Requirement already satisfied: tokenizers<0.22,>=0.21 in ./env/lib/python3.10/site-packages (from transformers==4.48.0->autotrain-advanced) (0.21.0)
Requirement already satisfied: rich in ./env/lib/python3.10/site-packages (from trl==0.13.0->autotrain-advanced) (13.9.4)
Requirement already satisfied: h11>=0.8 in ./env/lib/python3.10/site-packages (from uvicorn==0.34.0->autotrain-advanced) (0.14.0)
Requirement already satisfied: MarkupSafe>=2.1.1 in ./env/lib/python3.10/site-packages (from werkzeug==3.1.3->autotrain-advanced) (2.1.3)
Requirement already satisfied: nvidia-nccl-cu12 in ./env/lib/python3.10/site-packages (from xgboost==2.1.3->autotrain-advanced) (2.24.3)
Requirement already satisfied: stringzilla>=3.10.4 in ./env/lib/python3.10/site-packages (from albucore==0.0.21->albumentations==1.4.23->autotrain-advanced) (3.11.3)
Requirement already satisfied: simsimd>=5.9.2 in ./env/lib/python3.10/site-packages (from albucore==0.0.21->albumentations==1.4.23->autotrain-advanced) (6.2.1)
Requirement already satisfied: pyarrow>=15.0.0 in ./env/lib/python3.10/site-packages (from datasets~=3.2.0->datasets[vision]~=3.2.0->autotrain-advanced) (18.1.0)
Requirement already satisfied: aiohttp in ./env/lib/python3.10/site-packages (from datasets~=3.2.0->datasets[vision]~=3.2.0->autotrain-advanced) (3.11.11)
Requirement already satisfied: Mako in ./env/lib/python3.10/site-packages (from alembic>=1.5.0->optuna==4.1.0->autotrain-advanced) (1.3.8)
Requirement already satisfied: pycparser in ./env/lib/python3.10/site-packages (from cffi>=1.12->cryptography==44.0.0->autotrain-advanced) (2.22)
Requirement already satisfied: aiohappyeyeballs>=2.3.0 in ./env/lib/python3.10/site-packages (from aiohttp->datasets~=3.2.0->datasets[vision]~=3.2.0->autotrain-advanced) (2.4.4)
Requirement already satisfied: aiosignal>=1.1.2 in ./env/lib/python3.10/site-packages (from aiohttp->datasets~=3.2.0->datasets[vision]~=3.2.0->autotrain-advanced) (1.3.2)
Requirement already satisfied: async-timeout<6.0,>=4.0 in ./env/lib/python3.10/site-packages (from aiohttp->datasets~=3.2.0->datasets[vision]~=3.2.0->autotrain-advanced) (5.0.1)
Requirement already satisfied: attrs>=17.3.0 in ./env/lib/python3.10/site-packages (from aiohttp->datasets~=3.2.0->datasets[vision]~=3.2.0->autotrain-advanced) (24.3.0)
Requirement already satisfied: frozenlist>=1.1.1 in ./env/lib/python3.10/site-packages (from aiohttp->datasets~=3.2.0->datasets[vision]~=3.2.0->autotrain-advanced) (1.5.0)
Requirement already satisfied: multidict<7.0,>=4.5 in ./env/lib/python3.10/site-packages (from aiohttp->datasets~=3.2.0->datasets[vision]~=3.2.0->autotrain-advanced) (6.1.0)
Requirement already satisfied: propcache>=0.2.0 in ./env/lib/python3.10/site-packages (from aiohttp->datasets~=3.2.0->datasets[vision]~=3.2.0->autotrain-advanced) (0.2.1)
Requirement already satisfied: yarl<2.0,>=1.17.0 in ./env/lib/python3.10/site-packages (from aiohttp->datasets~=3.2.0->datasets[vision]~=3.2.0->autotrain-advanced) (1.18.3)
Requirement already satisfied: contourpy>=1.0.1 in ./env/lib/python3.10/site-packages (from matplotlib>=2.1.0->pycocotools==2.0.8->autotrain-advanced) (1.3.1)
Requirement already satisfied: cycler>=0.10 in ./env/lib/python3.10/site-packages (from matplotlib>=2.1.0->pycocotools==2.0.8->autotrain-advanced) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in ./env/lib/python3.10/site-packages (from matplotlib>=2.1.0->pycocotools==2.0.8->autotrain-advanced) (4.55.3)
Requirement already satisfied: kiwisolver>=1.3.1 in ./env/lib/python3.10/site-packages (from matplotlib>=2.1.0->pycocotools==2.0.8->autotrain-advanced) (1.4.8)
Requirement already satisfied: pyparsing>=2.3.1 in ./env/lib/python3.10/site-packages (from matplotlib>=2.1.0->pycocotools==2.0.8->autotrain-advanced) (3.2.1)
Requirement already satisfied: greenlet!=0.4.17 in ./env/lib/python3.10/site-packages (from sqlalchemy>=1.4.2->optuna==4.1.0->autotrain-advanced) (3.1.1)
Requirement already satisfied: exceptiongroup>=1.0.2 in ./env/lib/python3.10/site-packages (from anyio->httpx==0.28.1->autotrain-advanced) (1.2.2)
Requirement already satisfied: sniffio>=1.1 in ./env/lib/python3.10/site-packages (from anyio->httpx==0.28.1->autotrain-advanced) (1.3.1)
Requirement already satisfied: sympy in ./env/lib/python3.10/site-packages (from torch>=1.10.0->accelerate==1.2.1->autotrain-advanced) (1.13.3)
Requirement already satisfied: networkx in ./env/lib/python3.10/site-packages (from torch>=1.10.0->accelerate==1.2.1->autotrain-advanced) (3.2.1)
Requirement already satisfied: jinja2 in ./env/lib/python3.10/site-packages (from torch>=1.10.0->accelerate==1.2.1->autotrain-advanced) (3.1.4)
Requirement already satisfied: markdown-it-py>=2.2.0 in ./env/lib/python3.10/site-packages (from rich->trl==0.13.0->autotrain-advanced) (3.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in ./env/lib/python3.10/site-packages (from rich->trl==0.13.0->autotrain-advanced) (2.19.1)
Requirement already satisfied: mdurl~=0.1 in ./env/lib/python3.10/site-packages (from markdown-it-py>=2.2.0->rich->trl==0.13.0->autotrain-advanced) (0.1.2)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in ./env/lib/python3.10/site-packages (from sympy->torch>=1.10.0->accelerate==1.2.1->autotrain-advanced) (1.3.0)
Downloading autotrain_advanced-0.8.36-py3-none-any.whl (341 kB)
Installing collected packages: autotrain-advanced
Successfully installed autotrain-advanced-0.8.36
INFO | 2025-01-25 03:35:42 | autotrain.app.ui_routes:<module>:31 - Starting AutoTrain...
INFO | 2025-01-25 03:35:44 | autotrain.app.ui_routes:<module>:315 - AutoTrain started successfully
INFO | 2025-01-25 03:35:44 | autotrain.app.app:<module>:13 - Starting AutoTrain...
INFO | 2025-01-25 03:35:44 | autotrain.app.app:<module>:23 - AutoTrain version: 0.8.36
INFO | 2025-01-25 03:35:44 | autotrain.app.app:<module>:24 - AutoTrain started successfully
INFO: Started server process [53]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:7860 (Press CTRL+C to quit)
INFO: 10.16.46.168:22511 - "GET /?logs=container&__sign=eyJhbGciOiJFZERTQSJ9.eyJyZWFkIjp0cnVlLCJwZXJtaXNzaW9ucyI6eyJyZXBvLmNvbnRlbnQucmVhZCI6dHJ1ZX0sIm9uQmVoYWxmT2YiOnsia2luZCI6InVzZXIiLCJfaWQiOiI2NmM5MTJiOTc0NmNkZGQ5NTllM2RkNjciLCJ1c2VyIjoiZ29rc3RhZCJ9LCJpYXQiOjE3Mzc3NzYxNDUsInN1YiI6Ii9zcGFjZXMvZ29rc3RhZC9hdXRvdHJhaW4tbXl0aG9tYXgiLCJleHAiOjE3Mzc4NjI1NDUsImlzcyI6Imh0dHBzOi8vaHVnZ2luZ2ZhY2UuY28ifQ.d5y_fbriLBfpQr0l4lyFOUX1hQEZwvOuFRLavEJH2Bo0MMxZd6SB62vIyhipLbV7WkHeSDrGyp3eo2XZGx_VDQ HTTP/1.1" 307 Temporary Redirect
INFO: 10.16.46.168:22511 - "GET /?logs=container&__sign=eyJhbGciOiJFZERTQSJ9.eyJyZWFkIjp0cnVlLCJwZXJtaXNzaW9ucyI6eyJyZXBvLmNvbnRlbnQucmVhZCI6dHJ1ZX0sIm9uQmVoYWxmT2YiOnsia2luZCI6InVzZXIiLCJfaWQiOiI2NmM5MTJiOTc0NmNkZGQ5NTllM2RkNjciLCJ1c2VyIjoiZ29rc3RhZCJ9LCJpYXQiOjE3Mzc3NzYxNDUsInN1YiI6Ii9zcGFjZXMvZ29rc3RhZC9hdXRvdHJhaW4tbXl0aG9tYXgiLCJleHAiOjE3Mzc4NjI1NDUsImlzcyI6Imh0dHBzOi8vaHVnZ2luZ2ZhY2UuY28ifQ.d5y_fbriLBfpQr0l4lyFOUX1hQEZwvOuFRLavEJH2Bo0MMxZd6SB62vIyhipLbV7WkHeSDrGyp3eo2XZGx_VDQ HTTP/1.1" 307 Temporary Redirect
INFO: 10.16.21.179:2795 - "GET /ui/?logs=container&__sign=eyJhbGciOiJFZERTQSJ9.eyJyZWFkIjp0cnVlLCJwZXJtaXNzaW9ucyI6eyJyZXBvLmNvbnRlbnQucmVhZCI6dHJ1ZX0sIm9uQmVoYWxmT2YiOnsia2luZCI6InVzZXIiLCJfaWQiOiI2NmM5MTJiOTc0NmNkZGQ5NTllM2RkNjciLCJ1c2VyIjoiZ29rc3RhZCJ9LCJpYXQiOjE3Mzc3NzYxNDUsInN1YiI6Ii9zcGFjZXMvZ29rc3RhZC9hdXRvdHJhaW4tbXl0aG9tYXgiLCJleHAiOjE3Mzc4NjI1NDUsImlzcyI6Imh0dHBzOi8vaHVnZ2luZ2ZhY2UuY28ifQ.d5y_fbriLBfpQr0l4lyFOUX1hQEZwvOuFRLavEJH2Bo0MMxZd6SB62vIyhipLbV7WkHeSDrGyp3eo2XZGx_VDQ HTTP/1.1" 200 OK
INFO: 10.16.39.228:31650 - "GET /static/scripts/fetch_data_and_update_models.js?cb=2025-01-25%2003:35:46 HTTP/1.1" 200 OK
INFO: 10.16.46.168:22511 - "GET /static/scripts/listeners.js?cb=2025-01-25%2003:35:46 HTTP/1.1" 200 OK
INFO: 10.16.39.228:31650 - "GET /static/scripts/utils.js?cb=2025-01-25%2003:35:46 HTTP/1.1" 200 OK
INFO: 10.16.17.134:6478 - "GET /static/scripts/poll.js?cb=2025-01-25%2003:35:46 HTTP/1.1" 200 OK
INFO: 10.16.21.179:2795 - "GET /static/scripts/logs.js?cb=2025-01-25%2003:35:46 HTTP/1.1" 200 OK
INFO: 10.16.17.134:6478 - "GET /ui/accelerators HTTP/1.1" 200 OK
INFO: 10.16.46.168:22511 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO | 2025-01-25 03:35:46 | autotrain.app.ui_routes:fetch_params:415 - Task: llm:sft
INFO: 10.16.39.228:28699 - "GET /ui/params/llm%3Asft/basic HTTP/1.1" 200 OK
INFO: 10.16.39.228:31650 - "GET /ui/model_choices/llm%3Asft HTTP/1.1" 200 OK
INFO: 10.16.5.242:15988 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.39.228:10337 - "GET /ui/accelerators HTTP/1.1" 200 OK
INFO: 10.16.21.179:43884 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.17.134:44337 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.17.134:25359 - "GET /ui/accelerators HTTP/1.1" 200 OK
INFO: 10.16.39.228:32270 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.21.179:45088 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.17.134:18944 - "GET /ui/accelerators HTTP/1.1" 200 OK
INFO: 10.16.39.228:16908 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.39.228:6524 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.5.242:1769 - "GET /ui/accelerators HTTP/1.1" 200 OK
INFO: 10.16.46.168:52154 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.21.179:32684 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.21.179:3077 - "GET /ui/accelerators HTTP/1.1" 200 OK
INFO: 10.16.21.179:49111 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.5.242:63522 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.17.134:1175 - "GET /ui/accelerators HTTP/1.1" 200 OK
INFO: 10.16.17.134:44922 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.5.242:40721 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.5.242:34127 - "GET /ui/accelerators HTTP/1.1" 200 OK
INFO: 10.16.39.228:28308 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.5.242:16135 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.21.179:6174 - "GET /ui/accelerators HTTP/1.1" 200 OK
INFO: 10.16.17.134:28174 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.39.228:36156 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.17.134:57585 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.21.179:45654 - "GET /ui/accelerators HTTP/1.1" 200 OK
INFO: 10.16.21.179:26262 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.17.134:56677 - "GET /ui/accelerators HTTP/1.1" 200 OK
INFO: 10.16.5.242:6957 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.17.134:28645 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.39.228:33925 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.39.228:26393 - "GET /ui/accelerators HTTP/1.1" 200 OK
INFO: 10.16.39.228:53711 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.21.179:54032 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.5.242:39391 - "GET /ui/accelerators HTTP/1.1" 200 OK
INFO: 10.16.17.134:3412 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.5.242:30513 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.5.242:15297 - "GET /ui/accelerators HTTP/1.1" 200 OK
INFO: 10.16.5.242:30280 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.39.228:24538 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.17.134:38569 - "GET /ui/accelerators HTTP/1.1" 200 OK
INFO: 10.16.17.134:19283 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.39.228:33626 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.17.134:37982 - "GET /ui/accelerators HTTP/1.1" 200 OK
INFO: 10.16.39.228:28487 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.17.134:44322 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.5.242:18182 - "GET /ui/accelerators HTTP/1.1" 200 OK
INFO: 10.16.5.242:2627 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO | 2025-01-25 03:38:35 | autotrain.app.ui_routes:handle_form:540 - hardware: local-ui
INFO | 2025-01-25 03:38:35 | autotrain.backends.local:create:20 - Starting local training...
INFO | 2025-01-25 03:38:35 | autotrain.commands:launch_command:514 - ['accelerate', 'launch', '--num_machines', '1', '--num_processes', '1', '--mixed_precision', 'fp16', '-m', 'autotrain.trainers.clm', '--training_config', 'autotrain-t7kn1-a6v62/training_params.json']
INFO | 2025-01-25 03:38:35 | autotrain.commands:launch_command:515 - {'model': 'Gryphe/MythoMax-L2-13b', 'project_name': 'autotrain-t7kn1-a6v62', 'data_path': 'gokstad/13winters', 'train_split': 'train', 'valid_split': None, 'add_eos_token': True, 'block_size': 32, 'model_max_length': 128, 'padding': 'right', 'trainer': 'sft', 'use_flash_attention_2': False, 'log': 'tensorboard', 'disable_gradient_checkpointing': False, 'logging_steps': -1, 'eval_strategy': 'epoch', 'save_total_limit': 1, 'auto_find_batch_size': False, 'mixed_precision': 'fp16', 'lr': 0.0001, 'epochs': 1, 'batch_size': 16, 'warmup_ratio': 0.1, 'gradient_accumulation': 1, 'optimizer': 'adamw_torch', 'scheduler': 'linear', 'weight_decay': 0.0, 'max_grad_norm': 1.0, 'seed': 42, 'chat_template': 'none', 'quantization': 'int4', 'target_modules': 'all-linear', 'merge_adapter': False, 'peft': True, 'lora_r': 16, 'lora_alpha': 32, 'lora_dropout': 0.05, 'model_ref': None, 'dpo_beta': 0.1, 'max_prompt_length': 128, 'max_completion_length': None, 'prompt_text_column': 'prompt', 'text_column': '{ "input": "input", "output": "response" }', 'rejected_text_column': 'rejected_text', 'push_to_hub': True, 'username': 'gokstad', 'token': '*****', 'unsloth': False, 'distributed_backend': 'ddp'}
INFO | 2025-01-25 03:38:35 | autotrain.backends.local:create:25 - Training PID: 69
INFO: 10.16.21.179:5168 - "POST /ui/create_project HTTP/1.1" 200 OK
INFO: 10.16.5.242:21957 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.21.179:5168 - "GET /ui/accelerators HTTP/1.1" 200 OK
The following values were not passed to `accelerate launch` and had defaults used instead:
`--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
INFO: 10.16.5.242:9349 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO | 2025-01-25 03:38:42 | autotrain.trainers.clm.train_clm_sft:train:11 - Starting SFT training...
Generating train split: 0%| | 0/183 [00:00<?, ? examples/s]
Generating train split: 100%|██████████| 183/183 [00:00<00:00, 97703.36 examples/s]
INFO | 2025-01-25 03:38:43 | autotrain.trainers.clm.utils:process_input_data:550 - Train data: Dataset({
features: ['input', 'response'],
num_rows: 183
})
INFO | 2025-01-25 03:38:43 | autotrain.trainers.clm.utils:process_input_data:551 - Valid data: None
INFO | 2025-01-25 03:38:44 | autotrain.trainers.clm.utils:configure_logging_steps:671 - configuring logging steps
INFO | 2025-01-25 03:38:44 | autotrain.trainers.clm.utils:configure_logging_steps:684 - Logging steps: 2
INFO | 2025-01-25 03:38:44 | autotrain.trainers.clm.utils:configure_training_args:723 - configuring training args
INFO | 2025-01-25 03:38:44 | autotrain.trainers.clm.utils:configure_block_size:801 - Using block size 32
INFO | 2025-01-25 03:38:44 | autotrain.trainers.clm.utils:get_model:877 - Can use unsloth: False
WARNING | 2025-01-25 03:38:44 | autotrain.trainers.clm.utils:get_model:919 - Unsloth not available, continuing without it...
INFO | 2025-01-25 03:38:44 | autotrain.trainers.clm.utils:get_model:921 - loading model config...
INFO | 2025-01-25 03:38:44 | autotrain.trainers.clm.utils:get_model:929 - loading model...
`low_cpu_mem_usage` was None, now default to True since model is quantized.
Downloading shards: 0%| | 0/13 [00:00<?, ?it/s]INFO: 10.16.5.242:24479 - "GET /ui/accelerators HTTP/1.1" 200 OK
INFO: 10.16.39.228:43974 - "GET /ui/is_model_training HTTP/1.1" 200 OK
Downloading shards: 8%|▊ | 1/13 [00:03<00:39, 3.32s/it]
Downloading shards: 15%|█▌ | 2/13 [00:06<00:33, 3.04s/it]INFO: 10.16.17.134:58345 - "GET /ui/is_model_training HTTP/1.1" 200 OK
Downloading shards: 23%|██▎ | 3/13 [00:09<00:30, 3.05s/it]INFO: 10.16.5.242:22132 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.17.134:37273 - "GET /ui/accelerators HTTP/1.1" 200 OK
Downloading shards: 31%|███ | 4/13 [00:12<00:27, 3.00s/it]
Downloading shards: 38%|███▊ | 5/13 [00:14<00:23, 2.94s/it]INFO: 10.16.5.242:1092 - "GET /ui/is_model_training HTTP/1.1" 200 OK
Downloading shards: 46%|████▌ | 6/13 [00:18<00:21, 3.11s/it]INFO: 10.16.21.179:65248 - "GET /ui/accelerators HTTP/1.1" 200 OK
Downloading shards: 54%|█████▍ | 7/13 [00:21<00:19, 3.25s/it]INFO: 10.16.21.179:61531 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.17.134:57944 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.17.134:36413 - "GET /ui/accelerators HTTP/1.1" 200 OK
INFO: 10.16.39.228:52244 - "GET /ui/is_model_training HTTP/1.1" 200 OK
Downloading shards: 62%|██████▏ | 8/13 [00:35<00:32, 6.43s/it]INFO: 10.16.39.228:23435 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.5.242:62725 - "GET /ui/accelerators HTTP/1.1" 200 OK
INFO: 10.16.17.134:64798 - "GET /ui/is_model_training HTTP/1.1" 200 OK
Downloading shards: 69%|██████▉ | 9/13 [00:44<00:29, 7.31s/it]INFO: 10.16.39.228:51682 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.5.242:23793 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.21.179:27364 - "GET /ui/accelerators HTTP/1.1" 200 OK
Downloading shards: 77%|███████▋ | 10/13 [00:54<00:24, 8.23s/it]INFO: 10.16.5.242:24407 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.17.134:28586 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.5.242:54373 - "GET /ui/accelerators HTTP/1.1" 200 OK
Downloading shards: 85%|████████▍ | 11/13 [01:05<00:18, 9.02s/it]INFO: 10.16.39.228:17869 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.5.242:22587 - "GET /ui/accelerators HTTP/1.1" 200 OK
INFO: 10.16.17.134:9249 - "GET /ui/is_model_training HTTP/1.1" 200 OK
Downloading shards: 92%|█████████▏| 12/13 [01:16<00:09, 9.50s/it]INFO: 10.16.39.228:53699 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.17.134:7889 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.39.228:26434 - "GET /ui/accelerators HTTP/1.1" 200 OK
Downloading shards: 100%|██████████| 13/13 [01:23<00:00, 8.84s/it]
Downloading shards: 100%|██████████| 13/13 [01:23<00:00, 6.42s/it]
Loading checkpoint shards: 0%| | 0/13 [00:00<?, ?it/s]INFO: 10.16.5.242:61819 - "GET /ui/is_model_training HTTP/1.1" 200 OK
Loading checkpoint shards: 8%|▊ | 1/13 [00:04<00:53, 4.50s/it]
Loading checkpoint shards: 15%|█▌ | 2/13 [00:05<00:28, 2.59s/it]
Loading checkpoint shards: 23%|██▎ | 3/13 [00:06<00:16, 1.61s/it]
Loading checkpoint shards: 31%|███ | 4/13 [00:06<00:10, 1.14s/it]
Loading checkpoint shards: 38%|███▊ | 5/13 [00:07<00:07, 1.13it/s]
Loading checkpoint shards: 46%|████▌ | 6/13 [00:07<00:05, 1.37it/s]
Loading checkpoint shards: 54%|█████▍ | 7/13 [00:07<00:03, 1.59it/s]INFO: 10.16.5.242:64762 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.21.179:1873 - "GET /ui/accelerators HTTP/1.1" 200 OK
Loading checkpoint shards: 62%|██████▏ | 8/13 [00:08<00:02, 1.76it/s]
Loading checkpoint shards: 69%|██████▉ | 9/13 [00:08<00:02, 1.93it/s]
Loading checkpoint shards: 77%|███████▋ | 10/13 [00:09<00:01, 2.04it/s]
Loading checkpoint shards: 85%|████████▍ | 11/13 [00:09<00:00, 2.12it/s]
Loading checkpoint shards: 92%|█████████▏| 12/13 [00:10<00:00, 2.17it/s]
Loading checkpoint shards: 100%|██████████| 13/13 [00:10<00:00, 2.29it/s]
Loading checkpoint shards: 100%|██████████| 13/13 [00:10<00:00, 1.25it/s]
INFO | 2025-01-25 03:40:19 | autotrain.trainers.clm.utils:get_model:960 - model dtype: torch.float16
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
INFO: 10.16.21.179:1658 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.39.228:63570 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.17.134:26017 - "GET /ui/accelerators HTTP/1.1" 200 OK
INFO: 10.16.17.134:50441 - "GET /ui/is_model_training HTTP/1.1" 200 OK
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
INFO: 10.16.39.228:16279 - "GET /ui/accelerators HTTP/1.1" 200 OK
INFO: 10.16.39.228:64992 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.5.242:33437 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO: 10.16.21.179:28870 - "GET /ui/accelerators HTTP/1.1" 200 OK
INFO: 10.16.39.228:53596 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO | 2025-01-25 03:40:50 | autotrain.trainers.clm.train_clm_sft:train:39 - creating trainer
Generating train split: 0 examples [00:00, ? examples/s]
Generating train split: 0 examples [00:00, ? examples/s]
ERROR | 2025-01-25 03:40:51 | autotrain.trainers.common:wrapper:215 - train has failed due to an exception: Traceback (most recent call last):
File "/app/env/lib/python3.10/site-packages/datasets/builder.py", line 1607, in _prepare_split_single
for key, record in generator:
File "/app/env/lib/python3.10/site-packages/datasets/packaged_modules/generator/generator.py", line 33, in _generate_examples
for idx, ex in enumerate(self.config.generator(**gen_kwargs)):
File "/app/env/lib/python3.10/site-packages/trl/trainer/sft_trainer.py", line 492, in data_generator
yield from constant_length_iterator
File "/app/env/lib/python3.10/site-packages/trl/trainer/utils.py", line 648, in __iter__
buffer.append(self.formatting_func(next(iterator)))
File "/app/env/lib/python3.10/site-packages/trl/trainer/utils.py", line 623, in <lambda>
self.formatting_func = lambda x: x[dataset_text_field]
KeyError: '{ "input": "input", "output": "response" }'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/app/env/lib/python3.10/site-packages/trl/trainer/sft_trainer.py", line 495, in _prepare_packed_dataloader
packed_dataset = Dataset.from_generator(
File "/app/env/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 1117, in from_generator
).read()
File "/app/env/lib/python3.10/site-packages/datasets/io/generator.py", line 49, in read
self.builder.download_and_prepare(
File "/app/env/lib/python3.10/site-packages/datasets/builder.py", line 924, in download_and_prepare
self._download_and_prepare(
File "/app/env/lib/python3.10/site-packages/datasets/builder.py", line 1648, in _download_and_prepare
super()._download_and_prepare(
File "/app/env/lib/python3.10/site-packages/datasets/builder.py", line 1000, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/app/env/lib/python3.10/site-packages/datasets/builder.py", line 1486, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File "/app/env/lib/python3.10/site-packages/datasets/builder.py", line 1643, in _prepare_split_single
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/app/env/lib/python3.10/site-packages/autotrain/trainers/common.py", line 212, in wrapper
return func(*args, **kwargs)
File "/app/env/lib/python3.10/site-packages/autotrain/trainers/clm/__main__.py", line 28, in train
train_sft(config)
File "/app/env/lib/python3.10/site-packages/autotrain/trainers/clm/train_clm_sft.py", line 46, in train
trainer = SFTTrainer(
File "/app/env/lib/python3.10/site-packages/transformers/utils/deprecation.py", line 165, in wrapped_func
return func(*args, **kwargs)
File "/app/env/lib/python3.10/site-packages/trl/trainer/sft_trainer.py", line 265, in __init__
train_dataset = self._prepare_dataset(
File "/app/env/lib/python3.10/site-packages/trl/trainer/sft_trainer.py", line 391, in _prepare_dataset
return self._prepare_packed_dataloader(
File "/app/env/lib/python3.10/site-packages/trl/trainer/sft_trainer.py", line 499, in _prepare_packed_dataloader
raise ValueError(
ValueError: Error occurred while packing the dataset. Make sure that your dataset has enough samples to at least yield one packed sequence.
ERROR | 2025-01-25 03:40:51 | autotrain.trainers.common:wrapper:216 - Error occurred while packing the dataset. Make sure that your dataset has enough samples to at least yield one packed sequence.
INFO | 2025-01-25 03:40:51 | autotrain.trainers.common:pause_space:156 - Pausing space...