NER on SageMaker Run run_ner.py

Hello @philschmid I hope you are doing well. Question for you, do you an example of the expected format of the data in order to been able to use this script ( run_ner.py) in sagemaer training?

Thanks,

Jorge

Hey,

you can find the data format of all examples/ always inside the script. For run_ner.py it is here: transformers/run_ner.py at b518aaf193938247f698a7c4522afe42b025225a · huggingface/transformers · GitHub

    if data_args.text_column_name is not None:
        text_column_name = data_args.text_column_name
    elif "tokens" in column_names:
        text_column_name = "tokens"
    else:
        text_column_name = column_names[0]

    if data_args.label_column_name is not None:
        label_column_name = data_args.label_column_name
    elif f"{data_args.task_name}_tags" in column_names:
        label_column_name = f"{data_args.task_name}_tags"
    else:
        label_column_name = column_names[1]

In detail, you can either define text_column_name & label_column_name as hyperparameter to the define the column/key of your text/token and label field is. If you are not defining something it will pick index 0 for text/token and 1 for the label.

You can provide your dataset in data file formats, which are compatible with the datasets library, e.g. csv, json more to this here: Loading a Dataset — datasets 1.11.0 documentation

Thank you again Phillip.

1 Like