How to match locserver performance with Hugging face V3

jurassic1207 · October 22, 2024, 3:55am

Hello Hugging Face Team,

I built an STT engine on a local server using the Whisper V3 model downloaded from your platform. However, I am experiencing significant differences between the transcription results produced by the local server and those transcribed via your Hugging Face-hosted service. The audio source I used contains a mix of Korean and English, and the local transcription seems to struggle with accuracy, resulting in errors and inconsistencies.

Could you please provide guidance on:

Possible reasons for the differences in transcription performance.
Best practices for ensuring the local engine matches the performance of the cloud-based service.
Any dependencies, parameters, or configurations I should ensure to align between the two environments (e.g., beam search, model settings, preprocessing, etc.).

Thank you in advance for your help!

Best regards,
Jura

Topic		Replies	Views
Live Transcription/ASR Beginners	0	1679	September 18, 2022
Is that possible there is different output for the same model tested online and tested locally? Beginners	1	338	March 30, 2021
Fine-tuning Wav2Vec2 for English ASR with 🤗 on local machine Transformers 🤗Transformers	1	433	August 10, 2021
[STT] Using huggingface pretrained models but different results =>Wav2Vec2 vs PatrickDemo 🤗Transformers	0	448	December 27, 2021
How to embed Hugging Face Pre-trained models in our own app Beginners	2	910	March 26, 2021

How to match locserver performance with Hugging face V3

Related topics