Fine tuning facebook/bart-large-mnli zeroshot classifier

Is there any tutorial or example on how to do this?

I have prepared data according to the guildelines give here here

Here is my basic code for sage maker:

from sagemaker.huggingface import HuggingFace
distribution = {'smdistributed':{'dataparallel':{ 'enabled': True }}}

model_name='facebook/bart-large-mnli'
# hyperparameters, which are passed into the training job
hyperparameters={#'epochs': 1,
                #'train_batch_size': 8,
                'do_train' : True,
                'do_eval' : True,
                'model_name':model_name,
                'task_name': 'mnli',
                #'output_data_dir': '/opt/ml/output/data/',
                'output_dir': '/opt/ml/model',
                #'ignore_mismatched_sizes': True,
                'overwrite_output_dir' : True
                }

git_config = {'repo': 'https://github.com/huggingface/transformers.git','branch': 'v4.26.0'}


# creates Hugging Face estimator
huggingface_estimator = HuggingFace(
 
   entry_point='run_glue.py',
   source_dir='./examples/pytorch/text-classification',

   instance_type='ml.p3dn.24xlarge',
   
   instance_count=1,
   role=role,
   git_config=git_config,
   transformers_version='4.26.0',
   pytorch_version='1.13.1',
   py_version='py39',
   hyperparameters = hyperparameters,
   distribution = distribution,
   save_strategy= "no",
   save_total_limit=1,
   load_best_model_at_end=True
)
huggingface_estimator.fit({'train': training_input_path, 'test': testing_input_path})

Training data shape is as follows:

{‘label’: 2, ‘input_ids’: [0, 44758, 3457, 13, 5, 1263, 829, 31, 5, 1263, 8401, 4001, 438, 34, 5, 511, 7390, 2, 2, 713, 1246, 16, 4287, 92, 3457, 4, 2], ‘attention_mask’: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], ‘input_sentence’: ‘Create tests for the response received from the response whihc has the following formatThis example is Add new tests.’}

The problem is that after running this it takes 5 hours too uplaod a file to S3 and the size of the model.tar.gz is 158gb

Sorry, the training data shape is as follows:

{'label': 2, 'input_ids': [0, 44337, 3457, 7, 33, 809, 26567, 9773, 26567, 2, 2, 713, 1246, 16, 39391, 2210, 3457, 4, 2], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'input_sentence': '<s>Edit tests to have body validationbody validation</s></s>This example is Edit existing tests.</s>'}

@joeddav , could you please take a look? Thanks