A10G not using VRAM after generating training split in AutoTrain

I just want to confirm if this is normal behavior for the split step, and why the actual training might not be starting.

Yeah. Maybe. I think AutoTrain is designed to stop as quickly as possible when any error occurs.

BTW, that JSON settings may be for the parameters of the old version of Trainer. How about like this (in YAML)?

task: llm
base_model: nothingiisreal/MN-12B-Celeste-V1.9
project_name: mn12b-celeste-espanol-stage1
log: tensorboard

data:
  path: josecannete/large_spanish_corpus
  train_split: train
  valid_split: null
  chat_template: null
  column_mapping:
    text_column: text

params:
  trainer: sft
  block_size: -1
  model_max_length: 4096
  epochs: 1
  batch_size: 1
  gradient_accumulation: 16
  lr: 5e-5
  warmup_ratio: 0.1
  optimizer: adamw_torch
  scheduler: linear
  weight_decay: 0.01
  logging_steps: 25
  eval_strategy: epoch
  save_total_limit: 2
  mixed_precision: fp16

  # QLoRA
  peft: true
  quantization: int4
  target_modules: all-linear
  lora_r: 16
  lora_alpha: 32
  lora_dropout: 0.10

  padding: right
  seed: 42

hub:
  username: SlayerL99
  push_to_hub: true