@lhoestq, since I am doing the split=‘train’:
dataFiles = {
“train”: “./ADPConll/ADPConll_train.json”,
“validation”: “./ADPConll/ADPConll_valid.json”,
“test”: “./ADPConll/ADPConll_test.json”
}
dataset = load_dataset(‘json’, data_files=dataFiles, split=‘train’)
So I ran the following:
len(dataset) which is {int} 2
dataset[0] which is {dict: 5}
{'id': '0',
'chunk_tags': ['B-NP', 'B-VP', 'B-NP', 'I-NP', 'B-VP', 'I-VP', 'B-NP', 'I-NP', 'O'],
'ner_tags': ['B-ORG', 'O', 'B-MISC', 'O', 'O', 'O', 'B-MISC', 'O', 'O'],
'pos_tags': ['NNP', 'VBZ', 'JJ', 'NN', 'TO', 'VB', 'JJ', 'NN', '.'],
'tokens': ['EU', 'rejects', 'German', 'call', 'to', 'boycott', 'British', 'lamb', '.']}