Difficulty running distil-whisper

I’m using whisper to transcribe audio to text. I decided to use distil -whisper to speed it up. I’ve been trying to follow the instructions on Hugging Face but keep getting an error. I’m running this code sequentially and there are no issues.

!pip install virtualenv
!virtualenv myenv
!myenv/bing/pip install datasets
!source myenv/bin/activate
!myenv/bin/pip install --upgrade pip
!myenv/bin/pip install --upgrade transformers accelerate datasets[audio]

But then I add this code and I get an error:

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from datasets import load_dataset


device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "distil-whisper/distil-large-v3"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    torch_dtype=torch_dtype,
    device=device,
)

dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
sample = dataset[0]["audio"]

result = pipe(sample)
print(result["text"])

Traceback
ModuleNotFoundError
Traceback (most recent call last)
in <cell line: 3>()
1 import torch
2 from transformers import AutomodelForSpeechSeq2Seq, AutoProcessor, pipeline
------> 3 from datasets import load_dataset
4
5

ModuleNotFoundError: No module named ‘datasets’

I don’t understand why I’m getting an error now.