Language-modeling script "killed" when fine-tuning gpt2-medium

Hello! I’m just getting started with the huggingface libraries for text-generation.

I cloned the transformers repo from github, and then I was able to successfully run fine-tuning on the GPT-2 small model, on my macbook, using this command…

python3 examples/language-modeling/run_language_modeling.py \
    --output_dir=/my/data/gpt2-small-finetune \
    --model_type=gpt2 \
    --model_name_or_path=gpt2 \
    --do_train \
    --train_data_file=/path/to/my_corpus_text.txt

For now, I’m using a very small corpus… with a total of around 150kb of text. The training process took about 30 minutes (which I assume was using the CPU rather than GPU). Once it finished running, I was able to successfully use the fine-tuned model to generate text, like this:

python3 examples/text-generation/run_generation.py \
    --model_type=gpt2 \
    --model_name_or_path=/my/data/gpt2-small-finetune \
    --length=100 \
    --prompt="Once upon a time, there was a "

Next, I tried running the same exact process, with the same training corpus, but with a new output directory and with model_name_or_path set to gpt2-medium

python3 examples/language-modeling/run_language_modeling.py \
    --output_dir=/my/data/gpt2-medium-finetune \
    --model_type=gpt2 \
    --model_name_or_path=gpt2-medium \
    --do_train \
    --train_data_file=/path/to/my_corpus_text.txt

The process starts up and emits some logs, which look normal…

09/13/2020 17:31:51 - WARNING - __main__ -   Process rank: -1, device: cpu, n_gpu: 0, distributed training: False, 16-bits training: False
09/13/2020 17:31:51 - INFO - __main__ -   Training/evaluation parameters TrainingArguments(output_dir='/my/data/gpt2-small-finetune', overwrite_output_dir=False, do_train=True, do_eval=False, do_predict=False, evaluate_during_training=False, prediction_loss_only=False, per_device_train_batch_size=8, per_device_eval_batch_size=8, per_gpu_train_batch_size=None, per_gpu_eval_batch_size=None, gradient_accumulation_steps=1, learning_rate=5e-05, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=3.0, max_steps=-1, warmup_steps=0, logging_dir='runs/Sep13_17-31-51_APTI5214-MBP', logging_first_step=False, logging_steps=500, save_steps=500, save_total_limit=None, no_cuda=False, seed=42, fp16=False, fp16_opt_level='O1', local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, debug=False, dataloader_drop_last=False, eval_steps=1000, past_index=-1, run_name=None, disable_tqdm=False, remove_unused_columns=True)
/usr/local/lib/python3.8/site-packages/transformers/modeling_auto.py:777: FutureWarning: The class `AutoModelWithLMHead` is deprecated and will be removed in a future version. Please use `AutoModelForCausalLM` for causal language models, `AutoModelForMaskedLM` for masked language models and `AutoModelForSeq2SeqLM` for encoder-decoder models.
  warnings.warn(
/usr/local/lib/python3.8/site-packages/transformers/tokenization_utils_base.py:1319: FutureWarning: The `max_len` attribute has been deprecated and will be removed in a future version, use `model_max_length` instead.
  warnings.warn(
09/13/2020 17:32:03 - INFO - filelock -   Lock 5925535648 acquired on /path/to/cached_lm_GPT2Tokenizer_1024_my_corpus_text.txt.lock
09/13/2020 17:32:03 - INFO - filelock -   Lock 5925535648 released on /path/to/cached_lm_GPT2Tokenizer_1024_my_corpus_text.txt.lock
/usr/local/lib/python3.8/site-packages/transformers/trainer.py:249: FutureWarning: Passing `prediction_loss_only` as a keyword argument is deprecated and won't be possible in a future version. Use `args.prediction_loss_only` instead.

Then after about two minutes, the process exits with the message

Killed: 9

So now I’m stuck… Does anybody know what might be wrong? This is a 2019 MacBook Pro with 16 GB of RAM and plenty of free space on the SSD. I’m an experienced Java developer, but I’m a python novice, so I might be missing something critical about the environment.

@lysandre might know this

Killed 9 is very probably a memory error. This wouldn’t be surprising as you’re loading a bigger model in memory. Could you monitor your RAM usage when using the script?

Late reply, but I just ran into this problem and found a solution.
You can try to dimish the batch_size.I think that should reduce the memory usage. Keep in mind it will take longer to compute.

batch_size = 12 #reduce this number
model_name = model_checkpoint.split("/")[-1]
args = Seq2SeqTrainingArguments(
    f"{model_name}-finetuned-xsum",
    evaluation_strategy = "epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    weight_decay=0.01,
    save_total_limit=3,
    num_train_epochs=2,
    predict_with_generate=True,
)

trainer = Seq2SeqTrainer(
    model,
    args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    data_collator=data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)