Problem loading datasets library from Kaggle

After installing datasets and import it in Kaggle (with TPU support, no acceleration environment is ok), I received this warning message:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-1127ec19bd45> in <module>
     15 from operator import itemgetter
     16 from typing import Iterator, List, Optional
---> 17 import datasets
     18 from datasets import load_dataset, load_metric
     19 

/opt/conda/lib/python3.7/site-packages/datasets/__init__.py in <module>
     31     )
     32 
---> 33 from .arrow_dataset import Dataset, concatenate_datasets
     34 from .arrow_reader import ArrowReader, ReadInstruction
     35 from .arrow_writer import ArrowWriter

/opt/conda/lib/python3.7/site-packages/datasets/arrow_dataset.py in <module>
     36 import pandas as pd
     37 import pyarrow as pa
---> 38 import pyarrow.compute as pc
     39 from multiprocess import Pool, RLock
     40 from tqdm.auto import tqdm

/opt/conda/lib/python3.7/site-packages/pyarrow/compute.py in <module>
     16 # under the License.
     17 
---> 18 from pyarrow._compute import (  # noqa
     19     Function,
     20     FunctionOptions,

/opt/conda/lib/python3.7/site-packages/pyarrow/_compute.pyx in init pyarrow._compute()

ValueError: pyarrow.lib.Codec size changed, may indicate binary incompatibility. Expected 48 from C header, got 40 from PyObject

Seems to be updating pyarrow-5.0.0 from pyarrow-4.0.0>?
I tried to uninstall and install again with no help. any ideas? Thanks

Hi,

make sure to restart the kernel after installing datasets and then it should work.

2 Likes

Thanks restart the kernel works!

I ran into this issue too, and I’m wondering if it is possible to run the notebook using “Save Version.” Since the kernel can’t be restarted after hitting “Save Version”, I don’t think it is possible, right?

1 Like

It’s possible to fix the issue on kaggle by using no-deps while installing datasets. But you need to install xxhash and huggingface-hub first. This way pyarrow is not reinstalled.

I don’t this is an issue anymore because it seems like Kaggle includes datasets by default. I even made a notebook about it:

1 Like

Thanks. I guess my notebooks were running on an older VM.