Below is the error code generated
ValueError Traceback (most recent call last)
<ipython-input-38-29d47e6260b2> in <module>()
----> 1 trainer.train( )
4 frames
/usr/local/lib/python3.7/dist-packages/transformers/models/gpt2/modeling_gpt2.py in forward(self, input_ids, past_key_values, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, use_cache, output_attentions, output_hidden_states, return_dict)
674 batch_size = inputs_embeds.shape[0]
675 else:
--> 676 raise ValueError("You have to specify either input_ids or inputs_embeds")
677
678 device = input_ids.device if input_ids is not None else inputs_embeds.device
and here is my code
from transformers import GPT2Tokenizer, GPT2Model, Trainer, trainer_utils, TrainingArguments
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2Model.from_pretrained('gpt2')
from datasets import Dataset
dataset = Dataset.from_text('/content/chatbot.txt')
trainArgs = TrainingArguments( output_dir = os.path.join( os.getcwd(), 'customGPT2' )
, overwrite_output_dir = True
, do_train=True
, do_eval=True
, evaluation_strategy='steps'
, per_device_train_batch_size=4
, per_device_eval_batch_size =4
, gradient_accumulation_steps=1
, eval_accumulation_steps=1
, weight_decay=0
, adam_epsilon= 1e-08
, max_grad_norm = 1.0
, num_train_epochs =3.0
, max_steps = -1
, lr_scheduler_type = trainer_utils.SchedulerType('linear')
, logging_dir = os.path.join( os.getcwd(), 'log' )
, logging_steps = 2000
, logging_strategy = 'steps'
, save_steps = 2000
, save_strategy = 'steps'
, seed = 66
, fp16 = False
, fp16_opt_level = 'O1')
trainer=Trainer( model, args = trainArgs,train_dataset=dataset)
As I am quite new using trainer I actually try to follow as closely to docs as possible and only change few thing such as dataset because I need to use local dataset but I make sure to import it to datasets.Dataset() to be the same as the format that docs require and changing some training arguments. Thank you