Train end-to-end text classication on sagemaker

jackieliu930 · October 8, 2021, 7:47am

hi,

I’m following the guidence on training text-classfication using my own dataset,
refer to notebooks/sagemaker-notebook.ipynb at master · huggingface/notebooks · GitHub

I have two questions:

should the dataset contain label column only support int? in other words, I need to preprocess my data, convert categories to 1,2,3…?
do I need to specify the class number? if so, where?

thanks!

jackie

philschmid · October 8, 2021, 11:10am

Hey @jackieliu930,

Yes, the labels need to be int values
Yes you need to modify the .from_pretrained method here: notebooks/train.py at 3fdb8bd61ed2f2b499dcd55034b1ee58be5cfabb · huggingface/notebooks · GitHub

You could also use the run_glue.py from examples using the git_config then you don’t need to provide your own training script.

jackieliu930 · October 11, 2021, 3:04am

got it! super thanks! btw, I am wondering, why it seems that, with same parameter setting (epoch/ batch size), on same dataset, it appears that OOM happens when I use huggingfaceXsagemaker sdk, while works well with original pytorch sdk?
any clue on this one?

philschmid · October 11, 2021, 6:34am

Nice!
Did you use the same model, same dataset, same epoch & batch_size for train and eval, same instance type?

jackieliu930 · October 11, 2021, 6:44am

yes. totally the same.

philschmid · October 11, 2021, 6:58am

Also same Pytorch and Transformers version?

Topic		Replies	Views
About the Amazon SageMaker category Amazon SageMaker	25	4113	August 5, 2021
Training on Sagemaker with Trainer() Instance Amazon SageMaker	6	2290	November 3, 2021
Finetuning sentence embedding model with SageMaker - how to compute loss? Amazon SageMaker	9	3967	December 21, 2022
NER on SageMaker Run run_ner.py Amazon SageMaker	10	1988	February 14, 2022
Huggingface_hub integration: ModuleNotFoundError: No module named 'huggingface_hub' Amazon SageMaker	6	11444	December 6, 2021

Train end-to-end text classication on sagemaker

Related topics