Ensure the JSONL file is correctly formatted:
Each line in the file should be a valid JSON object with no extra commas or brackets. For example, the file should look like this:
{“src”:“hello”,“term”:{“a”:“aa”}}
{“src”:“hi”,“term”:{“b”:“bb”}}
After fixing the JSONL format, use the following code to load the dataset properly:
from datasets import load_dataset
path = “./testdata.jsonl”
dataset = load_dataset(‘json’, data_files=path, split=‘train’)
print(dataset[1]) # This should now work correctly
After these changes, the second entry should now print the correct data:
{‘src’: ‘hi’, ‘term’: {‘b’: ‘bb’}}
Also, ensure there are no extra spaces or line breaks in the dataset if it’s large. Each line should be a valid JSON object.
Response generated by Triskel Data Deterministic Ai