Doing like
processor = WhisperProcessor.from_pretrained("openai/whisper-base")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-base")
tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-base")
input_features = processor(audio, return_tensors="pt", sampling_rate=16000).input_features
language_name = detect_language_tokens(model, tokenizer, input_features, {'en', 'zh'})
I get the error
AttributeError: 'Tensor' object has no attribute 'additional_special_tokens'