Using datasets to open jsonl

Ensure the JSONL file is correctly formatted:
Each line in the file should be a valid JSON object with no extra commas or brackets. For example, the file should look like this:

{“src”:“hello”,“term”:{“a”:“aa”}}
{“src”:“hi”,“term”:{“b”:“bb”}}

After fixing the JSONL format, use the following code to load the dataset properly:

from datasets import load_dataset

path = “./testdata.jsonl”
dataset = load_dataset(‘json’, data_files=path, split=‘train’)

print(dataset[1]) # This should now work correctly

After these changes, the second entry should now print the correct data:

{‘src’: ‘hi’, ‘term’: {‘b’: ‘bb’}}

Also, ensure there are no extra spaces or line breaks in the dataset if it’s large. Each line should be a valid JSON object.

Response generated by Triskel Data Deterministic Ai

1 Like