Autotrain I've wasted my money on it but it doesn't work

huggingfacepremium · June 27, 2024, 10:40am

I’ve tried to fine tune multiple models using many different datasets and once i click the start training button it turns red for a couple of seconds then turns blue again, I’ve tried this with multiple models and different datasets but nothing works, I’ve included the log file below

Blockquote

Device 0: Tesla T4 - 7072MiB/15360MiB

You are not running the flash-attention implementation, expect numerical differences.

INFO | 2024-06-27 10:32:12 | autotrain.trainers.common:on_train_begin:231 - Starting to train…

Generating train split: 4262 examples [00:09, 442.90 examples/s]

Generating train split: 4000 examples [00:09, 494.18 examples/s]

Generating train split: 3723 examples [00:09, 407.22 examples/s]

Generating train split: 3262 examples [00:07, 629.96 examples/s]

Generating train split: 2843 examples [00:07, 475.21 examples/s]

Generating train split: 2245 examples [00:07, 281.14 examples/s]

Generating train split: 1415 examples [00:04, 315.25 examples/s]

Generating train split: 1000 examples [00:02, 654.30 examples/s]

Generating train split: 601 examples [00:02, 382.11 examples/s]

Generating train split: 1 examples [00:02, 2.10s/ examples]

Token indices sequence length is longer than the specified maximum sequence length for this model (2901 > 2048). Running this sequence through the model will result in indexing errors

warnings.warn(

/app/env/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:307: UserWarning: You passed a dataset_text_field argument to the SFTTrainer, the value you passed will override the one in the SFTConfig.

warnings.warn(

/app/env/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:269: UserWarning: You passed a max_seq_length argument to the SFTTrainer, the value you passed will override the one in the SFTConfig.

warnings.warn(

/app/env/lib/python3.10/site-packages/transformers/training_args.py:1965: FutureWarning: --push_to_hub_token is deprecated and will be removed in version 5 of Transformers. Use --hub_token instead.

warnings.warn(

/app/env/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:181: UserWarning: You passed a packing argument to the SFTTrainer, the value you passed will override the one in the SFTConfig.

warnings.warn(

/app/env/lib/python3.10/site-packages/transformers/training_args.py:1965: FutureWarning: --push_to_hub_token is deprecated and will be removed in version 5 of Transformers. Use --hub_token instead.

warnings.warn(message, FutureWarning)

Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.

/app/env/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py:100: FutureWarning: Deprecated argument(s) used in ‘init’: dataset_text_field, max_seq_length, packing. Will not be supported from version ‘1.0.0’.

INFO | 2024-06-27 10:32:01 | autotrain.trainers.clm.train_clm_sft:train:37 - creating trainer

INFO | 2024-06-27 10:32:01 | autotrain.trainers.clm.utils:get_model:666 - model dtype: torch.float16

Loading checkpoint shards: 100%|██████████| 2/2 [00:12<00:00, 6.23s/it]

Loading checkpoint shards: 100%|██████████| 2/2 [00:12<00:00, 5.95s/it]

low_cpu_mem_usage was None, now set to True since model is quantized.

INFO | 2024-06-27 10:31:48 | autotrain.trainers.clm.utils:get_model:635 - loading model…

INFO | 2024-06-27 10:31:48 | autotrain.trainers.clm.utils:get_model:627 - loading model config…

WARNING | 2024-06-27 10:31:48 | autotrain.trainers.clm.utils:get_model:625 - Unsloth not available, continuing without it…

INFO | 2024-06-27 10:31:48 | autotrain.trainers.clm.utils:get_model:583 - Can use unsloth: False

warnings.warn(

/app/env/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.

INFO | 2024-06-27 10:31:48 | autotrain.trainers.clm.utils:configure_block_size:548 - Using block size 1024

INFO | 2024-06-27 10:31:48 | autotrain.trainers.clm.utils:configure_training_args:485 - configuring training args

INFO | 2024-06-27 10:31:48 | autotrain.trainers.clm.utils:configure_logging_steps:480 - Logging steps: 25

INFO | 2024-06-27 10:31:48 | autotrain.trainers.clm.utils:configure_logging_steps:467 - configuring logging steps

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

INFO | 2024-06-27 10:31:48 | autotrain.trainers.clm.utils:process_input_data:395 - Valid data: None

})

num_rows: 9846

features: [‘text’],

INFO | 2024-06-27 10:31:48 | autotrain.trainers.clm.utils:process_input_data:394 - Train data: Dataset({

Repo card metadata block was not found. Setting CardData to empty.

INFO | 2024-06-27 10:31:47 | autotrain.trainers.clm.train_clm_sft:train:12 - Starting SFT training…

e[93m [WARNING] e[0m using untested triton version (2.3.0), only 1.0.0 is known to be compatible

e[93m [WARNING] e[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3

e[93m [WARNING] e[0m NVIDIA Inference is only supported on Ampere and newer architectures

e[93m [WARNING] e[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH

e[93m [WARNING] e[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.

e[93m [WARNING] e[0m async_io: please install the libaio-dev package with apt

e[93m [WARNING] e[0m async_io requires the dev libaio .so object and headers but these were not found.

[2024-06-27 10:31:47,544] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)

To avoid this warning pass in values for each of the problematic parameters or run accelerate config.

--dynamo_backend was set to a value of 'no'

The following values were not passed to accelerate launch and had defaults used instead:

INFO | 2024-06-27 10:31:40 | autotrain.backends.local:create:13 - Training PID: 620

INFO | 2024-06-27 10:31:40 | autotrain.commands:launch_command:401 - {‘model’: ‘microsoft/Phi-3-mini-4k-instruct’, ‘project_name’: ‘autotrain-uh8dc-qv9tt’, ‘data_path’: ‘timdettmers/openassistant-guanaco’, ‘train_split’: ‘train’, ‘valid_split’: None, ‘add_eos_token’: True, ‘block_size’: 1024, ‘model_max_length’: 2048, ‘padding’: ‘right’, ‘trainer’: ‘sft’, ‘use_flash_attention_2’: False, ‘log’: ‘tensorboard’, ‘disable_gradient_checkpointing’: False, ‘logging_steps’: -1, ‘eval_strategy’: ‘epoch’, ‘save_total_limit’: 1, ‘auto_find_batch_size’: False, ‘mixed_precision’: ‘fp16’, ‘lr’: 3e-05, ‘epochs’: 3, ‘batch_size’: 2, ‘warmup_ratio’: 0.1, ‘gradient_accumulation’: 4, ‘optimizer’: ‘adamw_torch’, ‘scheduler’: ‘linear’, ‘weight_decay’: 0.0, ‘max_grad_norm’: 1.0, ‘seed’: 42, ‘chat_template’: ‘none’, ‘quantization’: ‘int4’, ‘target_modules’: ‘all-linear’, ‘merge_adapter’: False, ‘peft’: True, ‘lora_r’: 16, ‘lora_alpha’: 32, ‘lora_dropout’: 0.05, ‘model_ref’: None, ‘dpo_beta’: 0.1, ‘max_prompt_length’: 128, ‘max_completion_length’: None, ‘prompt_text_column’: ‘prompt’, ‘text_column’: ‘text’, ‘rejected_text_column’: ‘rejected_text’, ‘push_to_hub’: True, ‘username’: ‘huggingfacepremium’, ‘token’: ‘*****’, ‘unsloth’: False}

INFO | 2024-06-27 10:31:40 | autotrain.commands:launch_command:400 - [‘accelerate’, ‘launch’, ‘–num_machines’, ‘1’, ‘–num_processes’, ‘1’, ‘–mixed_precision’, ‘fp16’, ‘-m’, ‘autotrain.trainers.clm’, ‘–training_config’, ‘autotrain-uh8dc-qv9tt/training_params.json’]

INFO | 2024-06-27 10:31:40 | autotrain.backends.local:create:8 - Starting local training…

INFO | 2024-06-27 10:31:40 | autotrain.app.ui_routes:handle_form:491 - hardware: local-ui

INFO | 2024-06-27 10:31:22 | autotrain.app.utils:kill_process_by_pid:52 - Sent SIGTERM to process with PID 439

INFO | 2024-06-27 10:31:22 | autotrain.app.utils:get_running_jobs:26 - Killing PID: 439

ERROR | 2024-06-27 10:31:20 | autotrain.trainers.common:wrapper:121 - Blockwise quantization only supports 16/32-bit floats, but got torch.uint8

ValueError: Blockwise quantization only supports 16/32-bit floats, but got torch.uint8

raise ValueError(f"Blockwise quantization only supports 16/32-bit floats, but got {A.dtype}")

File “/app/env/lib/python3.10/site-packages/bitsandbytes/functional.py”, line 1234, in quantize_4bit

w_4bit, quant_state = bnb.functional.quantize_4bit(

File “/app/env/lib/python3.10/site-packages/bitsandbytes/nn/modules.py”, line 289, in _quantize

return self._quantize(device)

File “/app/env/lib/python3.10/site-packages/bitsandbytes/nn/modules.py”, line 324, in to

new_value = bnb.nn.Params4bit(new_value, requires_grad=False, **kwargs).to(target_device)

File “/app/env/lib/python3.10/site-packages/transformers/quantizers/quantizer_bnb_4bit.py”, line 216, in create_quantized_param

hf_quantizer.create_quantized_param(model, param, param_name, param_device, state_dict, unexpected_keys)

File “/app/env/lib/python3.10/site-packages/transformers/modeling_utils.py”, line 889, in _load_state_dict_into_meta_model

new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(

File “/app/env/lib/python3.10/site-packages/transformers/modeling_utils.py”, line 4214, in _load_pretrained_model

) = cls._load_pretrained_model(

File “/app/env/lib/python3.10/site-packages/transformers/modeling_utils.py”, line 3754, in from_pretrained

return model_class.from_pretrained(

File “/app/env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py”, line 563, in from_pretrained

model = AutoModelForCausalLM.from_pretrained(

File “/app/env/lib/python3.10/site-packages/autotrain/trainers/clm/utils.py”, line 649, in get_model

model = utils.get_model(config, tokenizer)

File “/app/env/lib/python3.10/site-packages/autotrain/trainers/clm/train_clm_sft.py”, line 25, in train

train_sft(config)

File “/app/env/lib/python3.10/site-packages/autotrain/trainers/clm/main.py”, line 28, in train

return func(*args, **kwargs)

File “/app/env/lib/python3.10/site-packages/autotrain/trainers/common.py”, line 117, in wrapper

ERROR | 2024-06-27 10:31:20 | autotrain.trainers.common:wrapper:120 - train has failed due to an exception: Traceback (most recent call last):

low_cpu_mem_usage was None, now set to True since model is quantized.

INFO | 2024-06-27 10:31:13 | autotrain.trainers.clm.utils:get_model:635 - loading model…

INFO | 2024-06-27 10:31:13 | autotrain.trainers.clm.utils:get_model:627 - loading model config…

WARNING | 2024-06-27 10:31:13 | autotrain.trainers.clm.utils:get_model:625 - Unsloth not available, continuing without it…

INFO | 2024-06-27 10:31:13 | autotrain.trainers.clm.utils:get_model:583 - Can use unsloth: False

warnings.warn(

/app/env/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.

INFO | 2024-06-27 10:31:13 | autotrain.trainers.clm.utils:configure_block_size:548 - Using block size 1024

INFO | 2024-06-27 10:31:13 | autotrain.trainers.clm.utils:configure_training_args:485 - configuring training args

INFO | 2024-06-27 10:31:13 | autotrain.trainers.clm.utils:configure_logging_steps:480 - Logging steps: 25

INFO | 2024-06-27 10:31:13 | autotrain.trainers.clm.utils:configure_logging_steps:467 - configuring logging steps

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

INFO | 2024-06-27 10:31:13 | autotrain.trainers.clm.utils:process_input_data:395 - Valid data: None

})

num_rows: 9846

features: [‘text’],

INFO | 2024-06-27 10:31:13 | autotrain.trainers.clm.utils:process_input_data:394 - Train data: Dataset({

Generating test split: 100%|██████████| 518/518 [00:00<00:00, 78384.06 examples/s]

Generating test split: 0%| | 0/518 [00:00<?, ? examples/s]

Generating train split: 100%|██████████| 9846/9846 [00:00<00:00, 75561.98 examples/s]

Generating train split: 100%|██████████| 9846/9846 [00:00<00:00, 76628.26 examples/s]

Generating train split: 0%| | 0/9846 [00:00<?, ? examples/s]

Downloading data: 100%|██████████| 1.11M/1.11M [00:00<00:00, 26.2MB/s]

Downloading data: 0%| | 0.00/1.11M [00:00<?, ?B/s]

Downloading data: 100%|██████████| 20.9M/20.9M [00:01<00:00, 19.9MB/s]

Downloading data: 100%|██████████| 20.9M/20.9M [00:01<00:00, 18.9MB/s]

Downloading data: 0%| | 0.00/20.9M [00:00<?, ?B/s]

Repo card metadata block was not found. Setting CardData to empty.

Downloading readme: 100%|██████████| 395/395 [00:00<00:00, 3.05MB/s]

Downloading readme: 0%| | 0.00/395 [00:00<?, ?B/s]

INFO | 2024-06-27 10:31:10 | autotrain.trainers.clm.train_clm_sft:train:12 - Starting SFT training…

e[93m [WARNING] e[0m using untested triton version (2.3.0), only 1.0.0 is known to be compatible

e[93m [WARNING] e[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3

e[93m [WARNING] e[0m NVIDIA Inference is only supported on Ampere and newer architectures

e[93m [WARNING] e[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH

e[93m [WARNING] e[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.

e[93m [WARNING] e[0m async_io: please install the libaio-dev package with apt

e[93m [WARNING] e[0m async_io requires the dev libaio .so object and headers but these were not found.

[2024-06-27 10:31:10,548] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)

To avoid this warning pass in values for each of the problematic parameters or run accelerate config.

--dynamo_backend was set to a value of 'no'

The following values were not passed to accelerate launch and had defaults used instead:

INFO | 2024-06-27 10:31:03 | autotrain.backends.local:create:13 - Training PID: 439

INFO | 2024-06-27 10:31:03 | autotrain.commands:launch_command:401 - {‘model’: ‘VatsalPatel18/phi3-mini-WeatherBot’, ‘project_name’: ‘autotrain-uh8dc-qv9mm’, ‘data_path’: ‘timdettmers/openassistant-guanaco’, ‘train_split’: ‘train’, ‘valid_split’: None, ‘add_eos_token’: True, ‘block_size’: 1024, ‘model_max_length’: 2048, ‘padding’: ‘right’, ‘trainer’: ‘sft’, ‘use_flash_attention_2’: False, ‘log’: ‘tensorboard’, ‘disable_gradient_checkpointing’: False, ‘logging_steps’: -1, ‘eval_strategy’: ‘epoch’, ‘save_total_limit’: 1, ‘auto_find_batch_size’: False, ‘mixed_precision’: ‘fp16’, ‘lr’: 3e-05, ‘epochs’: 3, ‘batch_size’: 2, ‘warmup_ratio’: 0.1, ‘gradient_accumulation’: 4, ‘optimizer’: ‘adamw_torch’, ‘scheduler’: ‘linear’, ‘weight_decay’: 0.0, ‘max_grad_norm’: 1.0, ‘seed’: 42, ‘chat_template’: ‘none’, ‘quantization’: ‘int4’, ‘target_modules’: ‘all-linear’, ‘merge_adapter’: False, ‘peft’: True, ‘lora_r’: 16, ‘lora_alpha’: 32, ‘lora_dropout’: 0.05, ‘model_ref’: None, ‘dpo_beta’: 0.1, ‘max_prompt_length’: 128, ‘max_completion_length’: None, ‘prompt_text_column’: ‘prompt’, ‘text_column’: ‘text’, ‘rejected_text_column’: ‘rejected_text’, ‘push_to_hub’: True, ‘username’: ‘huggingfacepremium’, ‘token’: ‘*****’, ‘unsloth’: False}

INFO | 2024-06-27 10:31:03 | autotrain.commands:launch_command:400 - [‘accelerate’, ‘launch’, ‘–num_machines’, ‘1’, ‘–num_processes’, ‘1’, ‘–mixed_precision’, ‘fp16’, ‘-m’, ‘autotrain.trainers.clm’, ‘–training_config’, ‘autotrain-uh8dc-qv9mm/training_params.json’]

INFO | 2024-06-27 10:31:03 | autotrain.backends.local:create:8 - Starting local training…

INFO | 2024-06-27 10:31:03 | autotrain.app.ui_routes:handle_form:491 - hardware: local-ui

INFO | 2024-06-27 10:30:54 | autotrain.app.ui_routes:handle_form:491 - hardware: local-ui

INFO | 2024-06-27 10:26:12 | autotrain.app.utils:kill_process_by_pid:52 - Sent SIGTERM to process with PID 69

INFO | 2024-06-27 10:26:12 | autotrain.app.utils:get_running_jobs:26 - Killing PID: 69

ERROR | 2024-06-27 10:26:08 | autotrain.trainers.common:wrapper:121 - Error occurred while packing the dataset. Make sure that your dataset has enough samples to at least yield one packed sequence.

ValueError: Error occurred while packing the dataset. Make sure that your dataset has enough samples to at least yield one packed sequence.

raise ValueError(

File “/app/env/lib/python3.10/site-packages/trl/trainer/sft_trainer.py”, line 627, in _prepare_packed_dataloader

return self._prepare_packed_dataloader(

File “/app/env/lib/python3.10/site-packages/trl/trainer/sft_trainer.py”, line 519, in _prepare_dataset

train_dataset = self._prepare_dataset(

File “/app/env/lib/python3.10/site-packages/trl/trainer/sft_trainer.py”, line 362, in init

return f(*args, **kwargs)

File “/app/env/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py”, line 101, in inner_f

trainer = SFTTrainer(

File “/app/env/lib/python3.10/site-packages/autotrain/trainers/clm/train_clm_sft.py”, line 44, in train

train_sft(config)

File “/app/env/lib/python3.10/site-packages/autotrain/trainers/clm/main.py”, line 28, in train

return func(*args, **kwargs)

File “/app/env/lib/python3.10/site-packages/autotrain/trainers/common.py”, line 117, in wrapper

Traceback (most recent call last):

The above exception was the direct cause of the following exception:

datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset

raise DatasetGenerationError(“An error occurred while generating the dataset”) from e

File “/app/env/lib/python3.10/site-packages/datasets/builder.py”, line 1784, in _prepare_split_single

for job_id, done, content in self._prepare_split_single(

File “/app/env/lib/python3.10/site-packages/datasets/builder.py”, line 1627, in _prepare_split

self._prepare_split(split_generator, **prepare_split_kwargs)

File “/app/env/lib/python3.10/site-packages/datasets/builder.py”, line 1122, in _download_and_prepare

super()._download_and_prepare(

File “/app/env/lib/python3.10/site-packages/datasets/builder.py”, line 1789, in _download_and_prepare

self._download_and_prepare(

File “/app/env/lib/python3.10/site-packages/datasets/builder.py”, line 1027, in download_and_prepare

self.builder.download_and_prepare(

File “/app/env/lib/python3.10/site-packages/datasets/io/generator.py”, line 47, in read

).read()

File “/app/env/lib/python3.10/site-packages/datasets/arrow_dataset.py”, line 1125, in from_generator

packed_dataset = Dataset.from_generator(

File “/app/env/lib/python3.10/site-packages/trl/trainer/sft_trainer.py”, line 623, in _prepare_packed_dataloader

Traceback (most recent call last):

The above exception was the direct cause of the following exception:

KeyError: ‘text’

self.formatting_func = lambda x: x[dataset_text_field]

File “/app/env/lib/python3.10/site-packages/trl/trainer/utils.py”, line 480, in

buffer.append(self.formatting_func(next(iterator)))

File “/app/env/lib/python3.10/site-packages/trl/trainer/utils.py”, line 503, in iter

yield from constant_length_iterator

File “/app/env/lib/python3.10/site-packages/trl/trainer/sft_trainer.py”, line 620, in data_generator

for idx, ex in enumerate(self.config.generator(**gen_kwargs)):

File “/app/env/lib/python3.10/site-packages/datasets/packaged_modules/generator/generator.py”, line 30, in _generate_examples

for key, record in generator:

File “/app/env/lib/python3.10/site-packages/datasets/builder.py”, line 1748, in _prepare_split_single

ERROR | 2024-06-27 10:26:08 | autotrain.trainers.common:wrapper:120 - train has failed due to an exception: Traceback (most recent call last):

Generating train split: 0 examples [00:00, ? examples/s]

warnings.warn(

/app/env/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:307: UserWarning: You passed a dataset_text_field argument to the SFTTrainer, the value you passed will override the one in the SFTConfig.

warnings.warn(

/app/env/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:269: UserWarning: You passed a max_seq_length argument to the SFTTrainer, the value you passed will override the one in the SFTConfig.

warnings.warn(

/app/env/lib/python3.10/site-packages/transformers/training_args.py:1965: FutureWarning: --push_to_hub_token is deprecated and will be removed in version 5 of Transformers. Use --hub_token instead.

warnings.warn(

/app/env/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:181: UserWarning: You passed a packing argument to the SFTTrainer, the value you passed will override the one in the SFTConfig.

warnings.warn(

/app/env/lib/python3.10/site-packages/transformers/training_args.py:1965: FutureWarning: --push_to_hub_token is deprecated and will be removed in version 5 of Transformers. Use --hub_token instead.

warnings.warn(message, FutureWarning)

Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.

/app/env/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py:100: FutureWarning: Deprecated argument(s) used in ‘init’: dataset_text_field, max_seq_length, packing. Will not be supported from version ‘1.0.0’.

INFO | 2024-06-27 10:26:07 | autotrain.trainers.clm.train_clm_sft:train:37 - creating trainer

INFO | 2024-06-27 10:26:07 | autotrain.trainers.clm.utils:get_model:666 - model dtype: torch.float16

Loading checkpoint shards: 100%|██████████| 2/2 [00:13<00:00, 6.74s/it]

Loading checkpoint shards: 100%|██████████| 2/2 [00:13<00:00, 6.25s/it]

Downloading shards: 100%|██████████| 2/2 [00:36<00:00, 18.12s/it]

Downloading shards: 100%|██████████| 2/2 [00:36<00:00, 17.92s/it]

low_cpu_mem_usage was None, now set to True since model is quantized.

. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.

modeling_phi3.py

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:

INFO | 2024-06-27 10:25:17 | autotrain.trainers.clm.utils:get_model:635 - loading model…

INFO | 2024-06-27 10:25:17 | autotrain.trainers.clm.utils:get_model:627 - loading model config…

WARNING | 2024-06-27 10:25:17 | autotrain.trainers.clm.utils:get_model:625 - Unsloth not available, continuing without it…

INFO | 2024-06-27 10:25:17 | autotrain.trainers.clm.utils:get_model:583 - Can use unsloth: False

. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.

configuration_phi3.py

A new version of the following files was downloaded from huggingface.co/microsoft/Phi-3-mini-4k-instruct:

warnings.warn(

/app/env/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.

INFO | 2024-06-27 10:25:16 | autotrain.trainers.clm.utils:configure_block_size:548 - Using block size 1024

INFO | 2024-06-27 10:25:16 | autotrain.trainers.clm.utils:configure_training_args:485 - configuring training args

INFO | 2024-06-27 10:25:16 | autotrain.trainers.clm.utils:configure_logging_steps:480 - Logging steps: 25

INFO | 2024-06-27 10:25:16 | autotrain.trainers.clm.utils:configure_logging_steps:467 - configuring logging steps

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

INFO | 2024-06-27 10:25:16 | autotrain.trainers.clm.utils:process_input_data:395 - Valid data: None

})

num_rows: 16636

features: [‘question’, ‘answer’],

INFO | 2024-06-27 10:25:16 | autotrain.trainers.clm.utils:process_input_data:394 - Train data: Dataset({

Generating train split: 100%|██████████| 16636/16636 [00:00<00:00, 112909.06 examples/s]

Generating train split: 100%|██████████| 16636/16636 [00:00<00:00, 115699.95 examples/s]

Generating train split: 0%| | 0/16636 [00:00<?, ? examples/s]

Downloading data: 100%|██████████| 9.00M/9.00M [00:00<00:00, 19.1MB/s]

Downloading data: 100%|██████████| 9.00M/9.00M [00:00<00:00, 19.2MB/s]

Downloading data: 0%| | 0.00/9.00M [00:00<?, ?B/s]

Downloading readme: 100%|██████████| 145/145 [00:00<00:00, 1.12MB/s]

Downloading readme: 0%| | 0.00/145 [00:00<?, ?B/s]

INFO | 2024-06-27 10:25:15 | autotrain.trainers.clm.train_clm_sft:train:12 - Starting SFT training…

e[93m [WARNING] e[0m using untested triton version (2.3.0), only 1.0.0 is known to be compatible

e[93m [WARNING] e[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3

e[93m [WARNING] e[0m NVIDIA Inference is only supported on Ampere and newer architectures

e[93m [WARNING] e[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH

e[93m [WARNING] e[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.

e[93m [WARNING] e[0m async_io: please install the libaio-dev package with apt

e[93m [WARNING] e[0m async_io requires the dev libaio .so object and headers but these were not found.

[2024-06-27 10:25:14,784] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)

To avoid this warning pass in values for each of the problematic parameters or run accelerate config.

--dynamo_backend was set to a value of 'no'

The following values were not passed to accelerate launch and had defaults used instead:

INFO | 2024-06-27 10:25:07 | autotrain.backends.local:create:13 - Training PID: 69

INFO | 2024-06-27 10:25:07 | autotrain.commands:launch_command:401 - {‘model’: ‘microsoft/Phi-3-mini-4k-instruct’, ‘project_name’: ‘autotrain-uh8dc-qv9ua’, ‘data_path’: ‘huggingfacepremium/train’, ‘train_split’: ‘train’, ‘valid_split’: None, ‘add_eos_token’: True, ‘block_size’: 1024, ‘model_max_length’: 2048, ‘padding’: ‘right’, ‘trainer’: ‘sft’, ‘use_flash_attention_2’: False, ‘log’: ‘tensorboard’, ‘disable_gradient_checkpointing’: False, ‘logging_steps’: -1, ‘eval_strategy’: ‘epoch’, ‘save_total_limit’: 1, ‘auto_find_batch_size’: False, ‘mixed_precision’: ‘fp16’, ‘lr’: 3e-05, ‘epochs’: 3, ‘batch_size’: 2, ‘warmup_ratio’: 0.1, ‘gradient_accumulation’: 4, ‘optimizer’: ‘adamw_torch’, ‘scheduler’: ‘linear’, ‘weight_decay’: 0.0, ‘max_grad_norm’: 1.0, ‘seed’: 42, ‘chat_template’: ‘none’, ‘quantization’: ‘int4’, ‘target_modules’: ‘all-linear’, ‘merge_adapter’: False, ‘peft’: True, ‘lora_r’: 16, ‘lora_alpha’: 32, ‘lora_dropout’: 0.05, ‘model_ref’: None, ‘dpo_beta’: 0.1, ‘max_prompt_length’: 128, ‘max_completion_length’: None, ‘prompt_text_column’: ‘prompt’, ‘text_column’: ‘text’, ‘rejected_text_column’: ‘rejected_text’, ‘push_to_hub’: True, ‘username’: ‘huggingfacepremium’, ‘token’: ‘*****’, ‘unsloth’: False}

INFO | 2024-06-27 10:25:07 | autotrain.commands:launch_command:400 - [‘accelerate’, ‘launch’, ‘–num_machines’, ‘1’, ‘–num_processes’, ‘1’, ‘–mixed_precision’, ‘fp16’, ‘-m’, ‘autotrain.trainers.clm’, ‘–training_config’, ‘autotrain-uh8dc-qv9ua/training_params.json’]

INFO | 2024-06-27 10:25:07 | autotrain.backends.local:create:8 - Starting local training…

INFO | 2024-06-27 10:25:07 | autotrain.app.ui_routes:handle_form:491 - hardware: local-ui

Blockquote

Topic		Replies	Views
AUTOTRAIN NOT working at all Beginners	0	434	June 27, 2024
What data batch does SFTTrainer looks at when resumed training 🤗Transformers	0	105	May 21, 2024
Training data is not working Beginners	4	172	November 18, 2024
Trainer.train() is stuck 🤗Transformers	5	7309	May 1, 2023
Training stops while fine-tuning Llama2-7B with AutoTrain Advancedvanced Beginners	0	420	August 16, 2023

Autotrain I've wasted my money on it but it doesn't work

Related topics