According to datasets document:
A few interesting features are provided out-of-the-box by the Apache Arrow backend: - **multi-threaded** or single-threaded reading - automatic decompression of input files (based on the filename extension, such as my_data.csv.gz) - fetching column names from the first row in the CSV file - column-wise type inference and conversion to one of null, int64, float64, timestamp[s], string or binary data - detecting various spellings of null values such as NaN or #N/A
How can I read, for example
train.csv.gz in multi-threaded mode ?
datasets.load_dataset("csv", data_files="./train.csv.gz") but htop only show one cpu core is running