Get Mystery Box with random crypto!

Speech Technology

Logo of telegram channel speechtech — Speech Technology S
Logo of telegram channel speechtech — Speech Technology
Channel address: @speechtech
Categories: Technologies
Language: English
Subscribers: 652

Ratings & Reviews

2.67

3 reviews

Reviews can be left only by registered users. All reviews are moderated by admins.

5 stars

1

4 stars

0

3 stars

0

2 stars

1

1 stars

1


The latest Messages 9

2023-03-13 19:25:48 BUT 3rd

System description

https://www.fit.vutbr.cz/research/groups/speech/publi/2022/NIST_LRE_2022_System_Description.pdf
320 views16:25
Open / Comment
2023-03-13 19:18:49
https://twitter.com/TanelAlumae/status/1635221485060227072

https://arxiv.org/abs/2302.14624

https://haldus.taltech.ee/sites/default/files/2023-03/LRE22__Vocapia_TalTech_System_Description.pdf
319 viewsedited  16:18
Open / Comment
2023-03-13 01:41:13 Tried a popular https://github.com/Kyubyong/g2p. As usual, networks are very bad for unseen cases. Missing letters, extra letters, etc. Watch outputs carefully. Example:

bio-sand B AY1 OW0 S T AE2 N D
241 views22:41
Open / Comment
2023-03-11 01:05:35 https://t.me/speechtech/1449

Repeated this test with new Speechmatics. Async WER improved to 6.88. Indeed new Ursa model improved significantly!

An interesting thing is that it is phoneme-based
265 viewsedited  22:05
Open / Comment
2023-03-10 01:14:20
https://github.com/alibabasglab/mossformer
318 views22:14
Open / Comment
2023-03-09 20:32:15 faster whisper much faster than whisper.cpp

https://github.com/ggerganov/whisper.cpp/discussions/589
357 views17:32
Open / Comment
2023-03-08 20:34:27 https://www.speechmatics.com/company/articles-and-news/introducing-ursa-the-worlds-most-accurate-speech-to-text
226 views17:34
Open / Comment
2023-03-06 23:57:52 https://github.com/tuanct1997/Federated-Learning-ASR-based-on-wav2vec-2.0
160 views20:57
Open / Comment
2023-03-06 01:24:16 https://github.com/haoheliu/AudioLDM
305 viewsedited  22:24
Open / Comment
2023-03-04 02:51:08 12m hours of speech data

https://arxiv.org/abs/2303.01037

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

Yu Zhang, Wei Han, James Qin, Yongqiang Wang, Ankur Bapna, Zhehuai Chen, Nanxin Chen, Bo Li, Vera Axelrod, Gary Wang, Zhong Meng, Ke Hu, Andrew Rosenberg, Rohit Prabhavalkar, Daniel S. Park, Parisa Haghani, Jason Riesa, Ginger Perng, Hagen Soltau, Trevor Strohman, Bhuvana Ramabhadran, Tara Sainath, Pedro Moreno, Chung-Cheng Chiu, Johan Schalkwyk, Françoise Beaufays, Yonghui Wu

We introduce the Universal Speech Model (USM), a single large model that performs automatic speech recognition (ASR) across 100+ languages. This is achieved by pre-training the encoder of the model on a large unlabeled multilingual dataset of 12 million (M) hours spanning over 300 languages, and fine-tuning on a smaller labeled dataset. We use multilingual pre-training with random-projection quantization and speech-text modality matching to achieve state-of-the-art performance on downstream multilingual ASR and speech-to-text translation tasks. We also demonstrate that despite using a labeled training set 1/7-th the size of that used for the Whisper model, our model exhibits comparable or better performance on both in-domain and out-of-domain speech recognition tasks across many languages.
105 views23:51
Open / Comment