You can load and split the dataset as follows:
from datasets import Dataset, Audio
ds = Dataset.from_pandas(df)
ds = ds.cast_column("file", Audio())
ds.rename_column("text", "audio")
ds_dict_with_splits = ds.train_test_split(test_size=0.3)
I assume you want to train a speech recognition model - you can find a guide here and the transformers
example scripts here (replace their dataset initialization code with your dataset)