2023-03-16 01:55:28
New model from Assembly AI. Definitely improved from before, but not as great as Speechmatics.
On a toy test WER 10.89, previous assemblyAI (version 9) was at 11.04, version before 11.89. Speechmatics 6.88. Whisper large 8.94
https://twitter.com/AssemblyAI/status/1636050346240884744
Introducing Conformer-1: our latest state-of-the-art speech recognition model.
Built on top of the Conformer architecture and trained on 650K hours of audio data, it achieves near-human-level performance, making up to 43% fewer errors on noisy data than other ASR models.
We use a modified version of the conformer neural net published by Google Brain.
It's built on top of an Efficient Conformer (Orange Labs, 2021), that introduces the following technical modifications:
- Progressive Downsampling to reduce the length of the encoded sequence
- Grouped Attention: A modified version of the attention mechanism that makes it agnostic to sequence-length
These changes yield speedups of 29% at inference time and 36% at training time.
To further improve our model’s accuracy on noisy audio, we implemented a modified version of Sparse Attention, a pruning method for achieving sparsity of the model’s weights in order to achieve regularization.
We took inspiration from the data scaling laws described in DeepMind's Chinchilla paper and adapted them to the ASR domain.
Our team curated a dataset of 650K hours of English audio - making our model the largest-trained supervised model for English available today.
Based on our results, Conformer-1 is more robust on real-world data than popular commercial and open-source ASR models, making up to 43% fewer errors on average on noisy data:
The biggest improvement with this new release is in our robustness to a wide variety of data domains and noisy audio.
308 views22:55