I am trying to create a datasets object for audio files residing in Google Cloud storage. This is what I have in mind:
from datasets import Dataset, Audio
import pandas as pd
my_audio_paths_df = pd.DataFrame({āaudio_pathā: [
āgs://<my_gs_bucket>/<audio_path>/<audio_name>.wavā,
āgs://<my_gs_bucket>/<audio_path>/<audio_name>.wavā]})
my_audio_dataset = Dataset.from_pandas(my_audio_paths_df)
my_audio_dataset = my_audio_dataset.cast_column(āaudio_pathā, Audio(sampling_rate=16_000))
- This would work if I had local paths for my audio, but is there a way to do it for google cloud storage paths?