Speech Technology

Channel address:

Categories: Technologies

Language: English

Subscribers: 652

▲ Vote (1)

Ratings & Reviews

2.67

3 reviews

Reviews can be left only by registered users. All reviews are moderated by admins.

5 stars

4 stars

3 stars

2 stars

1 stars

The latest Messages 9

2023-03-13 19:25:48 BUT 3rd

System description

https://www.fit.vutbr.cz/research/groups/speech/publi/2022/NIST_LRE_2022_System_Description.pdf

320 views16:25

Open / Comment

2023-03-13 19:18:49

https://twitter.com/TanelAlumae/status/1635221485060227072

https://arxiv.org/abs/2302.14624

https://haldus.taltech.ee/sites/default/files/2023-03/LRE22__Vocapia_TalTech_System_Description.pdf

319 viewsedited 16:18

Open / Comment

2023-03-13 01:41:13 Tried a popular https://github.com/Kyubyong/g2p. As usual, networks are very bad for unseen cases. Missing letters, extra letters, etc. Watch outputs carefully. Example:

bio-sand B AY1 OW0 S T AE2 N D

241 views22:41

Open / Comment

2023-03-11 01:05:35 https://t.me/speechtech/1449

Repeated this test with new Speechmatics. Async WER improved to 6.88. Indeed new Ursa model improved significantly!

An interesting thing is that it is phoneme-based

265 viewsedited 22:05

Open / Comment

2023-03-10 01:14:20

https://github.com/alibabasglab/mossformer

318 views22:14

Open / Comment

2023-03-09 20:32:15 faster whisper much faster than whisper.cpp

https://github.com/ggerganov/whisper.cpp/discussions/589

357 views17:32

Open / Comment

2023-03-08 20:34:27 https://www.speechmatics.com/company/articles-and-news/introducing-ursa-the-worlds-most-accurate-speech-to-text

226 views17:34

Open / Comment

2023-03-06 23:57:52 https://github.com/tuanct1997/Federated-Learning-ASR-based-on-wav2vec-2.0

160 views20:57

Open / Comment

2023-03-06 01:24:16 https://github.com/haoheliu/AudioLDM

305 viewsedited 22:24

Open / Comment

2023-03-04 02:51:08 12m hours of speech data

https://arxiv.org/abs/2303.01037

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

Yu Zhang, Wei Han, James Qin, Yongqiang Wang, Ankur Bapna, Zhehuai Chen, Nanxin Chen, Bo Li, Vera Axelrod, Gary Wang, Zhong Meng, Ke Hu, Andrew Rosenberg, Rohit Prabhavalkar, Daniel S. Park, Parisa Haghani, Jason Riesa, Ginger Perng, Hagen Soltau, Trevor Strohman, Bhuvana Ramabhadran, Tara Sainath, Pedro Moreno, Chung-Cheng Chiu, Johan Schalkwyk, Françoise Beaufays, Yonghui Wu

We introduce the Universal Speech Model (USM), a single large model that performs automatic speech recognition (ASR) across 100+ languages. This is achieved by pre-training the encoder of the model on a large unlabeled multilingual dataset of 12 million (M) hours spanning over 300 languages, and fine-tuning on a smaller labeled dataset. We use multilingual pre-training with random-projection quantization and speech-text modality matching to achieve state-of-the-art performance on downstream multilingual ASR and speech-to-text translation tasks. We also demonstrate that despite using a labeled training set 1/7-th the size of that used for the Whisper model, our model exhibits comparable or better performance on both in-domain and out-of-domain speech recognition tasks across many languages.

105 views23:51

Open / Comment

Speech Technology

Ratings & Reviews

The latest Messages 9

Popular Channels

Related Chats

Popular Channels

Login