Chapter 5 questions

*** TypeError: Couldn’t cast array of type timestamp[s] to null**

from datasets import load_dataset
issues_dataset = load_dataset(“json”, data_files=“datasets-issues.jsonl”, split=“train”)
issues_dataset
Downloading data files: 100%

1/1 [00:00<00:00, 37.80it/s]

Extracting data files: 100%

1/1 [00:00<00:00, 44.29it/s]

Generating train split:

2584/0 [00:01<00:00, 3104.08 examples/s]

TypeError: Couldn’t cast array of type timestamp[s] to null

The above exception was the direct cause of the following exception:

DatasetGenerationError: An error occurred while generating the dataset

I divided the datasets-issues.jsonl into two files, and find each file can split correctly:

from datasets import load_dataset

issues_dataset_1 = load_dataset("json", data_files="datasets-issues-1.jsonl", split="train")

issues_dataset_1

Dataset({
features: [‘url’, ‘repository_url’, ‘labels_url’, ‘comments_url’, ‘events_url’, ‘html_url’, ‘id’, ‘node_id’, ‘number’, ‘title’, ‘user’, ‘labels’, ‘state’, ‘locked’, ‘assignee’, ‘assignees’, ‘milestone’, ‘comments’, ‘created_at’, ‘updated_at’, ‘closed_at’, ‘author_association’, ‘active_lock_reason’, ‘draft’, ‘pull_request’, ‘body’, ‘reactions’, ‘timeline_url’, ‘performed_via_github_app’, ‘state_reason’],
num_rows: 2884
})

from datasets import load_dataset

issues_dataset_2 = load_dataset("json", data_files="datasets-issues-2.jsonl", split="train")

issues_dataset_2

Dataset({
features: [‘url’, ‘repository_url’, ‘labels_url’, ‘comments_url’, ‘events_url’, ‘html_url’, ‘id’, ‘node_id’, ‘number’, ‘title’, ‘user’, ‘labels’, ‘state’, ‘locked’, ‘assignee’, ‘assignees’, ‘milestone’, ‘comments’, ‘created_at’, ‘updated_at’, ‘closed_at’, ‘author_association’, ‘active_lock_reason’, ‘draft’, ‘pull_request’, ‘body’, ‘reactions’, ‘timeline_url’, ‘performed_via_github_app’, ‘state_reason’],
num_rows: 3624
})

I try to combined issues_dataset_1 and issues_dataset_2 into issues_dataset, but did not succeed.
I decided to use issues_dataset_2 as issues_dataset, since I had waste too much time on this trivial matter