How to get better results with DistilGPT2?

jrreda · April 11, 2023, 12:34am

I Finetuned DistilGPT2 on the ArXiv Papers dataset: CShorten/ML-ArXiv-Papers · Datasets at Hugging Face , but got output like:

Regression is a popular technique in deep neural networks (DNNs).
However, it is not well understood why DNNs are vulnerable. In this paper, we
propose a deep neural network (DNN) based on deep convolutional neural
networks (CNNs) based on deep convolutional neural networks (DNNs). We
propose a deep neural network (DNN) based on deep convolutional neural
networks (DNNs) based on deep convolutional neural networks (DNNs). We
propose a DNN based on deep convolutional neural networks (DNNs). We
propose a DNN based on deep convolutional neural networks (DNNs). We
propose a DNN based on deep convolutional neural networks (DNNs). We
propose a DNN based on deep convolutional neural networks (DNNs). We
propose a DNN based on deep convolutional neural networks (DNNs). We
propose a DNN based on deep convolutional neural networks (DNNs). We
propose a DNN based on deep convolutional neural networks (DNNs). We
pro

here’re my arguments:

batch_size = 32

args = TrainingArguments(
    output_dir="arxiv-papers-ds",
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    evaluation_strategy="epoch",
    gradient_accumulation_steps=8,
    num_train_epochs=3,
    weight_decay=0.1,
    warmup_steps=200,
    lr_scheduler_type="cosine",
    optim='adamw_torch',
    learning_rate=3e-5,
    save_steps=500,
    fp16=True,  
    push_to_hub=False,
    report_to='none',
)

Full code: AI-projects/generate-arxiv-papers-with-distilgpt-2.ipynb at main · jrreda/AI-projects (github.com)

How to prevent this hallucination, and get better results?

Topic		Replies	Views
Fine-tuning DistilGPT2 on custom data, training Accuracy 100%, output is garbage 🤗Transformers	4	2058	January 31, 2024
Distilbart paper 🤗Transformers	17	2097	March 27, 2021
Inference time gets slower as dataset size increase 🤗Transformers	0	431	February 23, 2023
Distilgpt2 model Beginners	0	50	August 1, 2024
Training a language model from scratch with tensorflow (not pytorch)? Intermediate	4	856	August 9, 2021

How to get better results with DistilGPT2?

Related topics