public word error rate: 0.114294524
private word error rate: 0.112809661
How I used Seamless m4t large to get to the top 5 of the mozilla common voice competition. I only downloaded the test.tar.gz
directory later I unzipped it and resampled all the audio to 16KHz. I noticed that there was some audio that was muffled, and was pretty bad as is due to the sampling rates that were set. Anyways, the script I used to do the conversion is called prepare_files.sh
. Follow the instructions to install seamless m4t large. I performed inference on each audio file python asr.py
the output was then saved to asr_results.csv then it was formatted to a certain format needed for Zindi with python clean_submission.py
.
make run
Review huggingface leaderboard for the ASR models. Look for one with the fastest and the most accurate.
Facebook/meta have a lot of Speech to text models. Look for one that is capable of doing Speech to text. The ones that primarily do one thing seem to be the best.