ImportError: cannot import name 'load_dataset' from 'datasets' (unknown location)

Hey,

I am new to working with NLP and working through the tutorial. I installed the transformers library and after some trouble everything worked out. Now I tried to install the datasets library, installation went alright (details at end)

Now I’m trying to work with it in jupyter notebook. The line

import datasets

works out fine, but when I try

from datasets import load_dataset

I get the error from above. I looked around in this forum and also others and couldn’t find a solution.
I am using Python 3.9.12, Pytorch 1.12.0 and Tranformers version 4.22.0.dev0.

Any help is appreciated!:slight_smile:

Details regarding installation:
I installed in from source using pip3 inside a virtual environment; put it in the directory from which I also work on my project; checked the installation which is also suggested in the installation “tutorial” and this also worked out

Hi!

It’s strange, your code is ok… Did you check if the virtual environment is enabled in the notebook? Can you copy the error log?

Hey, thanks rwheel!

I think I enabled the environment, as I included it with (.env is the name of the environment)

python -m ipykernel install --user --name=.env

and got the acception

Installed kernelspec .env in /home/username/.local/share/jupyter/kernels/.env

I’m sorry but what do you mean by error log? And where do I find it?

Sorry, I was referring to the error you get when you do from datasets import load_dataset

After including the virtual environment to the jupyter notebook, did you change the kernel to the created env before running the script?

yes I did:/ I really can’t think of anything else to do haha. I deinstalled datasets again and reinstalled it and also double checked it by letting me show all the installed packages but now even import datasets doesnt work and I get the error ModuleNotFoundError: No module named 'datasets'

Is there anything I can do? Could it be a problem that I tried with different environments and they are somehow affecting eachother? Or could I try installing it outside of an environment?

It is very weird :expressionless:

I’ve just replicated the example in a google colab and it works well. So I also think, as you say, that it could be a problem between the environments… I work with conda to create and manage my environments, did you try that tool?

PS: the code that I tried in google colab is:

! pip install datasets

from datasets import load_dataset