A10G not using VRAM after generating training split in AutoTrain

SlayerL99 · August 12, 2025, 12:18am

Hi everyone,

I recently purchased a Hugging Face AutoTrain space with an NVIDIA A10G (24GB, reports ~22.49 GiB usable) for fine-tuning nothingiisreal/MN-12B-Celeste-V1.9 on josecannete/large_spanish_corpus.

When I start training, AutoTrain first generates the training split. During that step, VRAM usage is basically zero (around 2.88 MiB/22.49 GiB). After the split finishes, the process just stops — no training actually begins, and GPU usage never increases.

I expected VRAM usage to spike when training started, but it seems the job never reaches that stage.

Has anyone else experienced this with AutoTrain + A10G?
Could this be an issue with:

Dataset size or format?
The LoRA/PEFT + quantization setup I’m using?
Some AutoTrain pipeline bug for large models?

Any help would be appreciated — I just want to confirm if this is normal behavior for the split step, and why the actual training might not be starting.

Thanks in advance!

I want to train Celeste V1.9 to learn Spanish, then Spanish books with PleIAs/Spanish-PD-Books and then Argentine Spanish with ylacombe/google-argentinian-spanish. But I’m not that sure if my current JSON for Spanish Corpus is OK, or how to config the other JSON for next steps.

This is my JSON:

{
  "model": "nothingiisreal/MN-12B-Celeste-V1.9",
  "data": "josecannete/large_spanish_corpus",
  "task": "text-generation",

  "hub_model_id": "SlayerL99/mn12b-celeste-espanol-stage1",

  "training_parameters": {
    "learning_rate": 0.00005,
    "num_train_epochs": 1,
    "per_device_train_batch_size": 1,
    "gradient_accumulation_steps": 16,
    "warmup_steps": 100,
    "max_seq_length": 2048,
    "weight_decay": 0.01,
    "lr_scheduler_type": "cosine",
    "seed": 42,
    "fp16": true,                // switch bf16 to fp16 for better compatibility with 3060 downstream
    "bf16": false,
    "gradient_checkpointing": true,
    "dataloader_num_workers": 4,  // boost data loading speed (assuming Linux/WSL)
    "push_to_hub": true,
    "save_total_limit": 2,
    "logging_steps": 25,
    "save_steps": 200,
    "eval_steps": 200,
    "evaluation_strategy": "steps",
    "load_best_model_at_end": true,
    "metric_for_best_model": "eval_loss",
    "greater_is_better": false,
    "report_to": "tensorboard"
  },

  "peft_parameters": {
    "use_peft": true,
    "lora_r": 16,
    "lora_alpha": 32,
    "lora_dropout": 0.1,
    "bias": "none",
    "task_type": "CAUSAL_LM",
    "target_modules": "all-linear"
  },

  "quantization_parameters": {
    "use_int4": true,
    "use_int8": false,
    "use_fp4": false,           // disable fp4 for compatibility/stability on RTX 3060 GGUF
    "use_double_quant": true,
    "bnb_4bit_quant_type": "nf4"
  }
}

John6666 · August 12, 2025, 4:19am

I just want to confirm if this is normal behavior for the split step, and why the actual training might not be starting.

Yeah. Maybe. I think AutoTrain is designed to stop as quickly as possible when any error occurs.

BTW, that JSON settings may be for the parameters of the old version of Trainer. How about like this (in YAML)?

task: llm
base_model: nothingiisreal/MN-12B-Celeste-V1.9
project_name: mn12b-celeste-espanol-stage1
log: tensorboard

data:
  path: josecannete/large_spanish_corpus
  train_split: train
  valid_split: null
  chat_template: null
  column_mapping:
    text_column: text

params:
  trainer: sft
  block_size: -1
  model_max_length: 4096
  epochs: 1
  batch_size: 1
  gradient_accumulation: 16
  lr: 5e-5
  warmup_ratio: 0.1
  optimizer: adamw_torch
  scheduler: linear
  weight_decay: 0.01
  logging_steps: 25
  eval_strategy: epoch
  save_total_limit: 2
  mixed_precision: fp16

  # QLoRA
  peft: true
  quantization: int4
  target_modules: all-linear
  lora_r: 16
  lora_alpha: 32
  lora_dropout: 0.10

  padding: right
  seed: 42

hub:
  username: SlayerL99
  push_to_hub: true

SlayerL99 · August 12, 2025, 5:27am

Ah, that may be it:

I just transform that into JSON and paste it in the parameters and try?

John6666 · August 12, 2025, 5:29am

I just transform that into JSON and paste it in the parameters and try?

Maybe okay. I think they were treated the same internally anyway…

SlayerL99 · August 12, 2025, 6:25am

Heyo! I tried using that YAML into JSON, but the same happens. AutoTrainer gives “Error fertching trainer status” and just stops

John6666 · August 12, 2025, 6:41am

“Error fertching trainer status”

I think this error is caused by something wrong with the Accelerate library, but could it be that the version of the Accelerate library on the newly created space is outdated?

github.com/huggingface/autotrain-advanced

Issue with AutoTrain Advanced

opened 08:15PM - 04 Oct 24 UTC

closed 03:02PM - 25 Nov 24 UTC

Gladys-Toper

bug stale

### Prerequisites - [X] I have read the [documentation](https://hf.co/docs/auto…train). - [X] I have checked other issues for similar problems. ### Backend Hugging Face Space/Endpoints ### Interface Used UI ### CLI Command _No response_ ### UI Screenshots & Parameters Issue with AutoTrain Advanced: Project: Fine-tuning LLM (Legacy-Ledger-Fine-Tune-v1) Base Model: meta-llama/Llama-3-1-8B-Instruct Dataset: mb7419/legal-advice-reddit (Hugging Face Hub) Error Description: Initial error: ValueError regarding unknown split "0.8" Subsequent error: subprocess.CalledProcessError when launching training Key Error Details: Command execution failed with non-zero exit status 2 Unrecognized arguments: repeated "-m autotrain.trainers.clm" Training process (PID 354) terminated unexpectedly Additional Notes: Accelerate configuration warnings present Issue persists after addressing initial dataset split problem The errors suggest a potential bug in the AutoTrain backend, possibly related to command argument handling or training process initialization. This prevents the training from starting successfully. <img width="1196" alt="Screenshot 2024-10-04 at 1 14 44 PM" src="https://github.com/user-attachments/assets/922597d6-2eb9-40f1-a528-1caf669d51cb"> ### Error Logs INFO | 2024-10-04 20:09:15 | autotrain.app.utils:kill_process_by_pid:52 - Sent SIGTERM to process with PID 354 INFO | 2024-10-04 20:09:15 | autotrain.app.utils:get_running_jobs:26 - Killing PID: 354 subprocess.CalledProcessError: Command '['/app/env/bin/python', '-m', 'autotrain.trainers.clm', '--training_config', 'legacy-ledger-at-v1/training_params.json', '-m', 'autotrain.trainers.clm', '--training_config', 'legacy-ledger-at-v1/training_params.json', '-m', 'autotrain.trainers.clm', '--training_config', 'legacy-ledger-at-v1/training_params.json']' returned non-zero exit status 2. raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) File "/app/env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 769, in simple_launcher simple_launcher(args) File "/app/env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1174, in launch_command args.func(args) File "/app/env/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main sys.exit(main()) File "/app/env/bin/accelerate", line 8, in <module> Traceback (most recent call last): __main__.py: error: unrecognized arguments: -m autotrain.trainers.clm -m autotrain.trainers.clm usage: __main__.py [-h] --training_config TRAINING_CONFIG To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. `--dynamo_backend` was set to a value of `'no'` `--mixed_precision` was set to a value of `'no'` `--num_machines` was set to a value of `1` `--num_processes` was set to a value of `0` The following values were not passed to `accelerate launch` and had defaults used instead: INFO | 2024-10-04 20:09:10 | autotrain.backends.local:create:13 - Training PID: 354 INFO | 2024-10-04 20:09:10 | autotrain.commands:launch_command:502 - {'model': 'meta-llama/Llama-3.2-3B-Instruct', 'project_name': 'legacy-ledger-at-v1', 'data_path': 'bunny0702/Legal_Research', 'train_split': '0.8', 'valid_split': None, 'add_eos_token': True, 'block_size': 1024, 'model_max_length': 2048, 'padding': 'right', 'trainer': 'sft', 'use_flash_attention_2': False, 'log': 'tensorboard', 'disable_gradient_checkpointing': False, 'logging_steps': -1, 'eval_strategy': 'epoch', 'save_total_limit': 1, 'auto_find_batch_size': False, 'mixed_precision': 'bf16', 'lr': 3e-05, 'epochs': 3, 'batch_size': 8, 'warmup_ratio': 0.1, 'gradient_accumulation': 4, 'optimizer': 'adamw_torch', 'scheduler': 'linear', 'weight_decay': 0.0, 'max_grad_norm': 1.0, 'seed': 42, 'chat_template': 'none', 'quantization': 'int4', 'target_modules': 'all-linear', 'merge_adapter': False, 'peft': True, 'lora_r': 16, 'lora_alpha': 32, 'lora_dropout': 0.05, 'model_ref': None, 'dpo_beta': 0.1, 'max_prompt_length': 128, 'max_completion_length': None, 'prompt_text_column': 'prompt', 'text_column': 'text', 'rejected_text_column': 'rejected_text', 'push_to_hub': True, 'username': 'Gladystoper', 'token': '*****', 'unsloth': False, 'distributed_backend': None} INFO | 2024-10-04 20:09:10 | autotrain.commands:launch_command:501 - ['accelerate', 'launch', '--cpu', '-m', 'autotrain.trainers.clm', '--training_config', 'legacy-ledger-at-v1/training_params.json', '-m', 'autotrain.trainers.clm', '--training_config', 'legacy-ledger-at-v1/training_params.json', '-m', 'autotrain.trainers.clm', '--training_config', 'legacy-ledger-at-v1/training_params.json'] WARNING | 2024-10-04 20:09:10 | autotrain.commands:get_accelerate_command:52 - No GPU found. Forcing training on CPU. This will be super slow! INFO | 2024-10-04 20:09:10 | autotrain.backends.local:create:8 - Starting local training... INFO | 2024-10-04 20:09:10 | autotrain.app.ui_routes:handle_form:500 - hardware: local-ui INFO | 2024-10-04 20:08:05 | autotrain.app.utils:kill_process_by_pid:52 - Sent SIGTERM to process with PID 352 INFO | 2024-10-04 20:08:05 | autotrain.app.utils:get_running_jobs:26 - Killing PID: 352 subprocess.CalledProcessError: Command '['/app/env/bin/python', '-m', 'autotrain.trainers.clm', '--training_config', 'legacy-ledger-at-v1/training_params.json', '-m', 'autotrain.trainers.clm', '--training_config', 'legacy-ledger-at-v1/training_params.json']' returned non-zero exit status 2. raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) File "/app/env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 769, in simple_launcher simple_launcher(args) File "/app/env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1174, in launch_command args.func(args) File "/app/env/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main sys.exit(main()) File "/app/env/bin/accelerate", line 8, in <module> Traceback (most recent call last): __main__.py: error: unrecognized arguments: -m autotrain.trainers.clm usage: __main__.py [-h] --training_config TRAINING_CONFIG To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. `--dynamo_backend` was set to a value of `'no'` `--mixed_precision` was set to a value of `'no'` `--num_machines` was set to a value of `1` `--num_processes` was set to a value of `0` The following values were not passed to `accelerate launch` and had defaults used instead: INFO | 2024-10-04 20:07:59 | autotrain.backends.local:create:13 - Training PID: 352 INFO | 2024-10-04 20:07:59 | autotrain.commands:launch_command:502 - {'model': 'meta-llama/Llama-3.2-3B-Instruct', 'project_name': 'legacy-ledger-at-v1', 'data_path': 'bunny0702/Legal_Research', 'train_split': '0.8', 'valid_split': None, 'add_eos_token': True, 'block_size': 1024, 'model_max_length': 2048, 'padding': 'right', 'trainer': 'sft', 'use_flash_attention_2': False, 'log': 'tensorboard', 'disable_gradient_checkpointing': False, 'logging_steps': -1, 'eval_strategy': 'epoch', 'save_total_limit': 1, 'auto_find_batch_size': False, 'mixed_precision': 'fp16', 'lr': 3e-05, 'epochs': 3, 'batch_size': 2, 'warmup_ratio': 0.1, 'gradient_accumulation': 4, 'optimizer': 'adamw_torch', 'scheduler': 'linear', 'weight_decay': 0.0, 'max_grad_norm': 1.0, 'seed': 42, 'chat_template': 'none', 'quantization': 'int4', 'target_modules': 'all-linear', 'merge_adapter': False, 'peft': True, 'lora_r': 16, 'lora_alpha': 32, 'lora_dropout': 0.05, 'model_ref': None, 'dpo_beta': 0.1, 'max_prompt_length': 128, 'max_completion_length': None, 'prompt_text_column': 'prompt', 'text_column': 'text', 'rejected_text_column': 'rejected_text', 'push_to_hub': True, 'username': 'Gladystoper', 'token': '*****', 'unsloth': False, 'distributed_backend': None} INFO | 2024-10-04 20:07:59 | autotrain.commands:launch_command:501 - ['accelerate', 'launch', '--cpu', '-m', 'autotrain.trainers.clm', '--training_config', 'legacy-ledger-at-v1/training_params.json', '-m', 'autotrain.trainers.clm', '--training_config', 'legacy-ledger-at-v1/training_params.json'] WARNING | 2024-10-04 20:07:59 | autotrain.commands:get_accelerate_command:52 - No GPU found. Forcing training on CPU. This will be super slow! INFO | 2024-10-04 20:07:59 | autotrain.backends.local:create:8 - Starting local training... INFO | 2024-10-04 20:07:59 | autotrain.app.ui_routes:handle_form:500 - hardware: local-ui INFO | 2024-10-04 20:07:29 | autotrain.app.ui_routes:handle_form:500 - hardware: local-ui INFO | 2024-10-04 20:06:30 | autotrain.app.utils:kill_process_by_pid:52 - Sent SIGTERM to process with PID 343 INFO | 2024-10-04 20:06:30 | autotrain.app.utils:get_running_jobs:26 - Killing PID: 343 ERROR | 2024-10-04 20:06:26 | autotrain.trainers.common:wrapper:121 - Unknown split "0.8". Should be one of ['train']. ValueError: Unknown split "0.8". Should be one of ['train']. raise ValueError(f'Unknown split "{split}". Should be one of {list(name2len)}.') File "/app/env/lib/python3.10/site-packages/datasets/arrow_reader.py", line 480, in _rel_to_abs_instr return [_rel_to_abs_instr(rel_instr, name2len) for rel_instr in self._relative_instructions] File "/app/env/lib/python3.10/site-packages/datasets/arrow_reader.py", line 663, in <listcomp> return [_rel_to_abs_instr(rel_instr, name2len) for rel_instr in self._relative_instructions] File "/app/env/lib/python3.10/site-packages/datasets/arrow_reader.py", line 663, in to_absolute absolute_instructions = instruction.to_absolute(name2len) File "/app/env/lib/python3.10/site-packages/datasets/arrow_reader.py", line 134, in make_file_instructions file_instructions = make_file_instructions( File "/app/env/lib/python3.10/site-packages/datasets/arrow_reader.py", line 225, in get_file_instructions files = self.get_file_instructions(name, instructions, split_infos) File "/app/env/lib/python3.10/site-packages/datasets/arrow_reader.py", line 252, in read dataset_kwargs = ArrowReader(cache_dir, self.info).read( File "/app/env/lib/python3.10/site-packages/datasets/builder.py", line 1370, in _as_dataset ds = self._as_dataset( File "/app/env/lib/python3.10/site-packages/datasets/builder.py", line 1296, in _build_single_dataset mapped = function(data_struct) File "/app/env/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 484, in map_nested datasets = map_nested( File "/app/env/lib/python3.10/site-packages/datasets/builder.py", line 1266, in as_dataset ds = builder_instance.as_dataset(split=split, verification_mode=verification_mode, in_memory=keep_in_memory) File "/app/env/lib/python3.10/site-packages/datasets/load.py", line 2621, in load_dataset train_data = load_dataset( File "/app/env/lib/python3.10/site-packages/autotrain/trainers/clm/utils.py", line 351, in process_input_data train_data, valid_data = utils.process_input_data(config) File "/app/env/lib/python3.10/site-packages/autotrain/trainers/clm/train_clm_sft.py", line 14, in train train_sft(config) File "/app/env/lib/python3.10/site-packages/autotrain/trainers/clm/__main__.py", line 28, in train return func(*args, **kwargs) File "/app/env/lib/python3.10/site-packages/autotrain/trainers/common.py", line 117, in wrapper ERROR | 2024-10-04 20:06:26 | autotrain.trainers.common:wrapper:120 - train has failed due to an exception: Traceback (most recent call last): Generating train split: 100%|██████████| 155703/155703 [00:00<00:00, 3644055.24 examples/s] Generating train split: 0%| | 0/155703 [00:00<?, ? examples/s] Downloading data: 100%|██████████| 180k/180k [00:00<00:00, 1.35MB/s] Downloading data: 100%|██████████| 180k/180k [00:00<00:00, 1.36MB/s] Downloading data: 100%|██████████| 2.83M/2.83M [00:00<00:00, 8.18MB/s] Downloading data: 100%|██████████| 2.83M/2.83M [00:00<00:00, 8.23MB/s] Downloading data: 100%|██████████| 5.84k/5.84k [00:00<00:00, 50.1kB/s] Downloading data: 100%|██████████| 5.84k/5.84k [00:00<00:00, 50.2kB/s] Downloading data: 0%| | 0.00/5.84k [00:00<?, ?B/s] Downloading data: 100%|██████████| 152k/152k [00:00<00:00, 764kB/s] Downloading data: 100%|██████████| 152k/152k [00:00<00:00, 767kB/s] Downloading data: 0%| | 0.00/152k [00:00<?, ?B/s] INFO | 2024-10-04 20:06:22 | autotrain.trainers.clm.train_clm_sft:train:11 - Starting SFT training... To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. `--dynamo_backend` was set to a value of `'no'` `--mixed_precision` was set to a value of `'no'` `--num_machines` was set to a value of `1` `--num_processes` was set to a value of `0` The following values were not passed to `accelerate launch` and had defaults used instead: INFO | 2024-10-04 20:06:17 | autotrain.backends.local:create:13 - Training PID: 343 INFO | 2024-10-04 20:06:17 | autotrain.commands:launch_command:502 - {'model': 'meta-llama/Llama-3.2-3B-Instruct', 'project_name': 'legacy-ledger-at-v1', 'data_path': 'bunny0702/Legal_Research', 'train_split': '0.8', 'valid_split': None, 'add_eos_token': True, 'block_size': 1024, 'model_max_length': 2048, 'padding': 'right', 'trainer': 'sft', 'use_flash_attention_2': False, 'log': 'tensorboard', 'disable_gradient_checkpointing': False, 'logging_steps': -1, 'eval_strategy': 'epoch', 'save_total_limit': 1, 'auto_find_batch_size': False, 'mixed_precision': 'fp16', 'lr': 3e-05, 'epochs': 3, 'batch_size': 2, 'warmup_ratio': 0.1, 'gradient_accumulation': 4, 'optimizer': 'adamw_torch', 'scheduler': 'linear', 'weight_decay': 0.0, 'max_grad_norm': 1.0, 'seed': 42, 'chat_template': 'none', 'quantization': 'int4', 'target_modules': 'all-linear', 'merge_adapter': False, 'peft': True, 'lora_r': 16, 'lora_alpha': 32, 'lora_dropout': 0.05, 'model_ref': None, 'dpo_beta': 0.1, 'max_prompt_length': 128, 'max_completion_length': None, 'prompt_text_column': 'prompt', 'text_column': 'text', 'rejected_text_column': 'rejected_text', 'push_to_hub': True, 'username': 'Gladystoper', 'token': '*****', 'unsloth': False, 'distributed_backend': None} INFO | 2024-10-04 20:06:17 | autotrain.commands:launch_command:501 - ['accelerate', 'launch', '--cpu', '-m', 'autotrain.trainers.clm', '--training_config', 'legacy-ledger-at-v1/training_params.json'] WARNING | 2024-10-04 20:06:17 | autotrain.commands:get_accelerate_command:52 - No GPU found. Forcing training on CPU. This will be super slow! INFO | 2024-10-04 20:06:17 | autotrain.backends.local:create:8 - Starting local training... INFO | 2024-10-04 20:06:17 | autotrain.app.ui_routes:handle_form:500 - hardware: local-ui INFO | 2024-10-04 20:05:25 | autotrain.app.ui_routes:fetch_params:391 - Param distributed_backend not found in UI_PARAMS INFO | 2024-10-04 20:05:25 | autotrain.app.ui_routes:fetch_params:381 - Task: llm:sft INFO: 10.203.0.7:58006 - "GET /?__sign=eyJhbGciOiJFZERTQSJ9.eyJyZWFkIjp0cnVlLCJwZXJtaXNzaW9ucyI6eyJyZXBvLmNvbnRlbnQucmVhZCI6dHJ1ZX0sIm9uQmVoYWxmT2YiOnsia2luZCI6InVzZXIiLCJfaWQiOiI2NTEwNTY4YThmMzIyOGQ4MDdhNzFkNjkiLCJ1c2VyIjoiR2xhZHlzdG9wZXIifSwiaWF0IjoxNzI4MDcyMzIzLCJzdWIiOiIvc3BhY2VzL0dsYWR5c3RvcGVyL2F1dG90cmFpbi1hZHZhbmNlZCIsImV4cCI6MTcyODE1ODcyMywiaXNzIjoiaHR0cHM6Ly9odWdnaW5nZmFjZS5jbyJ9.f01ZyphHQ-H1f0hqNduC6oLd-R1_oqQqrJOE4IGNDVV1ESg_RZRun38N4lLoWEPOn91dfw6jvrJg_Ilg42nlAA HTTP/1.1" 307 Temporary Redirect INFO: 10.203.4.41:54838 - "GET /?__sign=eyJhbGciOiJFZERTQSJ9.eyJyZWFkIjp0cnVlLCJwZXJtaXNzaW9ucyI6eyJyZXBvLmNvbnRlbnQucmVhZCI6dHJ1ZX0sIm9uQmVoYWxmT2YiOnsia2luZCI6InVzZXIiLCJfaWQiOiI2NTEwNTY4YThmMzIyOGQ4MDdhNzFkNjkiLCJ1c2VyIjoiR2xhZHlzdG9wZXIifSwiaWF0IjoxNzI4MDcyMzIzLCJzdWIiOiIvc3BhY2VzL0dsYWR5c3RvcGVyL2F1dG90cmFpbi1hZHZhbmNlZCIsImV4cCI6MTcyODE1ODcyMywiaXNzIjoiaHR0cHM6Ly9odWdnaW5nZmFjZS5jbyJ9.f01ZyphHQ-H1f0hqNduC6oLd-R1_oqQqrJOE4IGNDVV1ESg_RZRun38N4lLoWEPOn91dfw6jvrJg_Ilg42nlAA HTTP/1.1" 307 Temporary Redirect INFO | 2024-10-04 20:02:27 | autotrain.app.ui_routes:fetch_params:391 - Param distributed_backend not found in UI_PARAMS INFO | 2024-10-04 20:02:27 | autotrain.app.ui_routes:fetch_params:381 - Task: llm:sft INFO: 10.203.0.7:39860 - "GET /?__sign=eyJhbGciOiJFZERTQSJ9.eyJyZWFkIjp0cnVlLCJwZXJtaXNzaW9ucyI6eyJyZXBvLmNvbnRlbnQucmVhZCI6dHJ1ZX0sIm9uQmVoYWxmT2YiOnsia2luZCI6InVzZXIiLCJfaWQiOiI2NTEwNTY4YThmMzIyOGQ4MDdhNzFkNjkiLCJ1c2VyIjoiR2xhZHlzdG9wZXIifSwiaWF0IjoxNzI4MDcyMTQ2LCJzdWIiOiIvc3BhY2VzL0dsYWR5c3RvcGVyL2F1dG90cmFpbi1hZHZhbmNlZCIsImV4cCI6MTcyODE1ODU0NiwiaXNzIjoiaHR0cHM6Ly9odWdnaW5nZmFjZS5jbyJ9.mmzj3_JYvkG4SsyTvlNl9oxBl8jfpuK0wlbQMjZeBfjEK28ijS6sSYk6hPH9PS3oKIfdqJSof2omk9KDpPrCDg HTTP/1.1" 307 Temporary Redirect INFO: 10.203.4.41:52548 - "GET /?__sign=eyJhbGciOiJFZERTQSJ9.eyJyZWFkIjp0cnVlLCJwZXJtaXNzaW9ucyI6eyJyZXBvLmNvbnRlbnQucmVhZCI6dHJ1ZX0sIm9uQmVoYWxmT2YiOnsia2luZCI6InVzZXIiLCJfaWQiOiI2NTEwNTY4YThmMzIyOGQ4MDdhNzFkNjkiLCJ1c2VyIjoiR2xhZHlzdG9wZXIifSwiaWF0IjoxNzI4MDcyMTQ2LCJzdWIiOiIvc3BhY2VzL0dsYWR5c3RvcGVyL2F1dG90cmFpbi1hZHZhbmNlZCIsImV4cCI6MTcyODE1ODU0NiwiaXNzIjoiaHR0cHM6Ly9odWdnaW5nZmFjZS5jbyJ9.mmzj3_JYvkG4SsyTvlNl9oxBl8jfpuK0wlbQMjZeBfjEK28ijS6sSYk6hPH9PS3oKIfdqJSof2omk9KDpPrCDg HTTP/1.1" 307 Temporary Redirect INFO | 2024-10-04 20:01:49 | autotrain.app.ui_routes:fetch_params:391 - Param distributed_backend not found in UI_PARAMS INFO | 2024-10-04 20:01:49 | autotrain.app.ui_routes:fetch_params:381 - Task: llm:sft INFO: 10.203.0.7:53992 - "GET / HTTP/1.1" 307 Temporary Redirect INFO: 10.203.0.7:53992 - "GET /auth?code=pCIvjXNYbHDvcDZN&state=rLaqKVN6FpwF0ETcx5Ga0HWfrCqb1t HTTP/1.1" 307 Temporary Redirect INFO: 10.203.0.7:53992 - "GET /login/huggingface?__sign=eyJhbGciOiJFZERTQSJ9.eyJyZWFkIjp0cnVlLCJwZXJtaXNzaW9ucyI6eyJyZXBvLmNvbnRlbnQucmVhZCI6dHJ1ZX0sIm9uQmVoYWxmT2YiOnsia2luZCI6InVzZXIiLCJfaWQiOiI2NTEwNTY4YThmMzIyOGQ4MDdhNzFkNjkiLCJ1c2VyIjoiR2xhZHlzdG9wZXIifSwiaWF0IjoxNzI4MDcyMTA0LCJzdWIiOiIvc3BhY2VzL0dsYWR5c3RvcGVyL2F1dG90cmFpbi1hZHZhbmNlZCIsImV4cCI6MTcyODE1ODUwNCwiaXNzIjoiaHR0cHM6Ly9odWdnaW5nZmFjZS5jbyJ9.DDPA7r37oUjUrnhV4tKSThbbemp4bIlcRJ1jITIKI4kUp9EnwopGjjP1bqvWjVOQ7DpzZaI94JfFeCIcK-mVBQ HTTP/1.1" 302 Found ERROR | 2024-10-04 20:01:45 | autotrain.app.ui_routes:load_index:347 - Failed to get user and orgs: object of type '_TemplateResponse' has no len() INFO: 10.203.0.7:53992 - "GET /?__sign=eyJhbGciOiJFZERTQSJ9.eyJyZWFkIjp0cnVlLCJwZXJtaXNzaW9ucyI6eyJyZXBvLmNvbnRlbnQucmVhZCI6dHJ1ZX0sIm9uQmVoYWxmT2YiOnsia2luZCI6InVzZXIiLCJfaWQiOiI2NTEwNTY4YThmMzIyOGQ4MDdhNzFkNjkiLCJ1c2VyIjoiR2xhZHlzdG9wZXIifSwiaWF0IjoxNzI4MDcyMTA0LCJzdWIiOiIvc3BhY2VzL0dsYWR5c3RvcGVyL2F1dG90cmFpbi1hZHZhbmNlZCIsImV4cCI6MTcyODE1ODUwNCwiaXNzIjoiaHR0cHM6Ly9odWdnaW5nZmFjZS5jbyJ9.DDPA7r37oUjUrnhV4tKSThbbemp4bIlcRJ1jITIKI4kUp9EnwopGjjP1bqvWjVOQ7DpzZaI94JfFeCIcK-mVBQ HTTP/1.1" 307 Temporary Redirect INFO: 10.203.4.41:36212 - "GET /?__sign=eyJhbGciOiJFZERTQSJ9.eyJyZWFkIjp0cnVlLCJwZXJtaXNzaW9ucyI6eyJyZXBvLmNvbnRlbnQucmVhZCI6dHJ1ZX0sIm9uQmVoYWxmT2YiOnsia2luZCI6InVzZXIiLCJfaWQiOiI2NTEwNTY4YThmMzIyOGQ4MDdhNzFkNjkiLCJ1c2VyIjoiR2xhZHlzdG9wZXIifSwiaWF0IjoxNzI4MDcyMTA0LCJzdWIiOiIvc3BhY2VzL0dsYWR5c3RvcGVyL2F1dG90cmFpbi1hZHZhbmNlZCIsImV4cCI6MTcyODE1ODUwNCwiaXNzIjoiaHR0cHM6Ly9odWdnaW5nZmFjZS5jbyJ9.DDPA7r37oUjUrnhV4tKSThbbemp4bIlcRJ1jITIKI4kUp9EnwopGjjP1bqvWjVOQ7DpzZaI94JfFeCIcK-mVBQ HTTP/1.1" 307 Temporary Redirect ERROR | 2024-10-04 02:03:05 | autotrain.app.ui_routes:user_authentication:324 - Failed to verify token: Invalid token (/oauth/userinfo). Please login with a write token. ERROR | 2024-10-04 02:03:05 | autotrain.app.utils:token_verification:84 - Failed to request /oauth/userinfo - 504 ERROR | 2024-10-04 02:03:05 | autotrain.app.ui_routes:user_authentication:324 - Failed to verify token: Invalid token (/oauth/userinfo). Please login with a write token. ERROR | 2024-10-04 02:03:05 | autotrain.app.utils:token_verification:84 - Failed to request /oauth/userinfo - 504 INFO | 2024-10-04 01:42:36 | autotrain.app.ui_routes:fetch_params:391 - Param distributed_backend not found in UI_PARAMS INFO | 2024-10-04 01:42:36 | autotrain.app.ui_routes:fetch_params:381 - Task: llm:sft INFO: 10.203.14.122:50602 - "GET /?logs=build&__sign=eyJhbGciOiJFZERTQSJ9.eyJyZWFkIjp0cnVlLCJwZXJtaXNzaW9ucyI6eyJyZXBvLmNvbnRlbnQucmVhZCI6dHJ1ZX0sIm9uQmVoYWxmT2YiOnsia2luZCI6InVzZXIiLCJfaWQiOiI2NTEwNTY4YThmMzIyOGQ4MDdhNzFkNjkiLCJ1c2VyIjoiR2xhZHlzdG9wZXIifSwiaWF0IjoxNzI4MDA2MTUzLCJzdWIiOiIvc3BhY2VzL0dsYWR5c3RvcGVyL2F1dG90cmFpbi1hZHZhbmNlZCIsImV4cCI6MTcyODA5MjU1MywiaXNzIjoiaHR0cHM6Ly9odWdnaW5nZmFjZS5jbyJ9.dnUKIVelGNJw5CMS1JkW3wgFATQbdXScm-1aELkG_U5msHucvhHlC4R7445LYEB_HWfk1M0LXep7fk1A8mWhDg HTTP/1.1" 307 Temporary Redirect INFO: 10.203.14.157:50772 - "GET /?logs=build&__sign=eyJhbGciOiJFZERTQSJ9.eyJyZWFkIjp0cnVlLCJwZXJtaXNzaW9ucyI6eyJyZXBvLmNvbnRlbnQucmVhZCI6dHJ1ZX0sIm9uQmVoYWxmT2YiOnsia2luZCI6InVzZXIiLCJfaWQiOiI2NTEwNTY4YThmMzIyOGQ4MDdhNzFkNjkiLCJ1c2VyIjoiR2xhZHlzdG9wZXIifSwiaWF0IjoxNzI4MDA2MTUzLCJzdWIiOiIvc3BhY2VzL0dsYWR5c3RvcGVyL2F1dG90cmFpbi1hZHZhbmNlZCIsImV4cCI6MTcyODA5MjU1MywiaXNzIjoiaHR0cHM6Ly9odWdnaW5nZmFjZS5jbyJ9.dnUKIVelGNJw5CMS1JkW3wgFATQbdXScm-1aELkG_U5msHucvhHlC4R7445LYEB_HWfk1M0LXep7fk1A8mWhDg HTTP/1.1" 307 Temporary Redirect INFO: Uvicorn running on http://0.0.0.0:7860 (Press CTRL+C to quit) INFO: Application startup complete. INFO: Waiting for application startup. INFO: Started server process [121] INFO | 2024-10-04 01:42:32 | autotrain.app.app:<module>:24 - AutoTrain started successfully INFO | 2024-10-04 01:42:32 | autotrain.app.app:<module>:23 - AutoTrain version: 0.8.21 INFO | 2024-10-04 01:42:32 | autotrain.app.app:<module>:13 - Starting AutoTrain... INFO | 2024-10-04 01:42:32 | autotrain.app.ui_routes:<module>:298 - AutoTrain started successfully INFO | 2024-10-04 01:42:27 | autotrain.app.ui_routes:<module>:32 - Starting AutoTrain... ### Additional Information _No response_

Topic		Replies	Views
Autotrain I've wasted my money on it but it doesn't work 🤗AutoTrain	0	671	June 27, 2024
Failed to verify token Spaces	1	66	March 22, 2025
Need help setting up autotrain locally Beginners	1	478	May 21, 2024
Train a model with autotrain on huggingface using the API 🤗AutoTrain	4	329	May 30, 2024
Https://ui.autotrain.huggingface.co/36589/trainings 🤗AutoTrain	2	426	February 23, 2023

A10G not using VRAM after generating training split in AutoTrain

Related topics