How to match locserver performance with Hugging face V3

Hello Hugging Face Team,

I built an STT engine on a local server using the Whisper V3 model downloaded from your platform. However, I am experiencing significant differences between the transcription results produced by the local server and those transcribed via your Hugging Face-hosted service. The audio source I used contains a mix of Korean and English, and the local transcription seems to struggle with accuracy, resulting in errors and inconsistencies.

Could you please provide guidance on:

  1. Possible reasons for the differences in transcription performance.
  2. Best practices for ensuring the local engine matches the performance of the cloud-based service.
  3. Any dependencies, parameters, or configurations I should ensure to align between the two environments (e.g., beam search, model settings, preprocessing, etc.).

Thank you in advance for your help!

Best regards,
Jura

1 Like