I want to train an Audio Classifier from scratch using one of the datasets provided by huggingface, for example using librispeech_asr · Datasets at Hugging Face. How do i determine the number of speaker IDs (‘speaker_id’ field) so i can set the size of my final dense layer in my classifier?