Problem loading datasets library from Kaggle

After installing datasets and import it in Kaggle (with TPU support, no acceleration environment is ok), I received this warning message:

ValueError                                Traceback (most recent call last)
<ipython-input-8-1127ec19bd45> in <module>
     15 from operator import itemgetter
     16 from typing import Iterator, List, Optional
---> 17 import datasets
     18 from datasets import load_dataset, load_metric

/opt/conda/lib/python3.7/site-packages/datasets/ in <module>
     31     )
---> 33 from .arrow_dataset import Dataset, concatenate_datasets
     34 from .arrow_reader import ArrowReader, ReadInstruction
     35 from .arrow_writer import ArrowWriter

/opt/conda/lib/python3.7/site-packages/datasets/ in <module>
     36 import pandas as pd
     37 import pyarrow as pa
---> 38 import pyarrow.compute as pc
     39 from multiprocess import Pool, RLock
     40 from import tqdm

/opt/conda/lib/python3.7/site-packages/pyarrow/ in <module>
     16 # under the License.
---> 18 from pyarrow._compute import (  # noqa
     19     Function,
     20     FunctionOptions,

/opt/conda/lib/python3.7/site-packages/pyarrow/_compute.pyx in init pyarrow._compute()

ValueError: pyarrow.lib.Codec size changed, may indicate binary incompatibility. Expected 48 from C header, got 40 from PyObject

Seems to be updating pyarrow-5.0.0 from pyarrow-4.0.0>?
I tried to uninstall and install again with no help. any ideas? Thanks


make sure to restart the kernel after installing datasets and then it should work.


Thanks restart the kernel works!

I ran into this issue too, and I’m wondering if it is possible to run the notebook using “Save Version.” Since the kernel can’t be restarted after hitting “Save Version”, I don’t think it is possible, right?

It’s possible to fix the issue on kaggle by using no-deps while installing datasets. But you need to install xxhash and huggingface-hub first. This way pyarrow is not reinstalled.

I don’t this is an issue anymore because it seems like Kaggle includes datasets by default. I even made a notebook about it:

1 Like

Thanks. I guess my notebooks were running on an older VM.