Speech Technology

Channel address:

Categories: Technologies

Language: English

Subscribers: 652

▲ Vote (1)

Ratings & Reviews

2.67

3 reviews

Reviews can be left only by registered users. All reviews are moderated by admins.

5 stars

4 stars

3 stars

2 stars

1 stars

The latest Messages 8

2023-04-01 03:37:39 This is interesting, all open source conformer implementations have bugs:

We have just released open source a bug-free implementation of the Conformer model.
Check it at: https://github.com/hlt-mt/FBK-fairseq/blob/master/fbk_works/BUGFREE_CONFORMER.md
Want to discover what "bug-free" means?
Take a look at our paper: https://arxiv.org/pdf/2303.16166.pdf

#opensource #conformer #speech #bug #bugfree #NLProc

https://twitter.com/sarapapi/status/1641750885524029440

54 views00:37

Open / Comment

2023-04-01 03:24:17

WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research

Propose a three-stage processing pipeline for filtering noisy data and generating high-quality captions, where ChatGPT.

Конвейер обработки для фильтрации зашумленных данных и создания высококачественных титров.

Github: https://github.com/xinhaomei/wavcaps

Paper: https://arxiv.org/abs/2303.17395v1

Dataset: https://paperswithcode.com/dataset/sounddescs

ai_machinelearning_big_data

32 views00:24

Open / Comment

2023-03-30 12:57:36 12th ISCA Speech Synthesis Workshop (SSW) is now open for submissions!
Final submission deadline: May, 3 2023
Late breaking reports submission deadline : June, 28 2023

The Speech Synthesis Workshop will be held in Grenoble, France and is organized as a satellite event of the Interspeech conference in Dublin, Ireland
Come and join the SSW community and the people who creates machines that talk!

Visit the official site for more information
https://ssw2023.org/

322 views09:57

Open / Comment

2023-03-27 21:00:59 : We've just released our GitHub repository for #ASR and #NLP tools for air traffic control communications, based on ATCO2 dataset
@Atco2P
!

We made public 5000+ hours of audio --> research on ASR for ATC.

GitHub: https://github.com/idiap/atco2-corpus

https://twitter.com/Pablogomez3/status/1640331512389279744

242 views18:00

Open / Comment

2023-03-26 00:23:33 The amount of models this guy trained is quite outstanding

https://malaya-speech.readthedocs.io/en/latest/index.html

387 views21:23

Open / Comment

2023-03-16 11:23:18

Streaming punctuation model is interesting

https://github.com/alibaba-damo-academy/FunASR/releases/tag/v0.3.0

279 views08:23

Open / Comment

2023-03-16 02:49:47 Kincaid46 WER from Ursa announcement:

AssemblyAI: 8.6
Speechmatics: 7.88
Microsoft: 9.70
Whisper Large-v2: 8.7
Vosk 0.42 Gigaspeech 15.8
Google 12.52
Amazon 10.94

293 viewsedited 23:49

Open / Comment

2023-03-16 01:55:28 New model from Assembly AI. Definitely improved from before, but not as great as Speechmatics.

On a toy test WER 10.89, previous assemblyAI (version 9) was at 11.04, version before 11.89. Speechmatics 6.88. Whisper large 8.94

https://twitter.com/AssemblyAI/status/1636050346240884744

Introducing Conformer-1: our latest state-of-the-art speech recognition model.

Built on top of the Conformer architecture and trained on 650K hours of audio data, it achieves near-human-level performance, making up to 43% fewer errors on noisy data than other ASR models.

We use a modified version of the conformer neural net published by Google Brain.

It's built on top of an Efficient Conformer (Orange Labs, 2021), that introduces the following technical modifications:

- Progressive Downsampling to reduce the length of the encoded sequence
- Grouped Attention: A modified version of the attention mechanism that makes it agnostic to sequence-length

These changes yield speedups of 29% at inference time and 36% at training time.

To further improve our model’s accuracy on noisy audio, we implemented a modified version of Sparse Attention, a pruning method for achieving sparsity of the model’s weights in order to achieve regularization.

We took inspiration from the data scaling laws described in DeepMind's Chinchilla paper and adapted them to the ASR domain.

Our team curated a dataset of 650K hours of English audio - making our model the largest-trained supervised model for English available today.

Based on our results, Conformer-1 is more robust on real-world data than popular commercial and open-source ASR models, making up to 43% fewer errors on average on noisy data:

The biggest improvement with this new release is in our robustness to a wide variety of data domains and noisy audio.

308 views22:55

Open / Comment

2023-03-15 00:35:52 How can we make inference faster when using big #speech #selfsupervised models?

Check out @salah_zaiem 's paper that compares various approaches, revealing some pretty interesting insights.

https://arxiv.org/abs/2303.06740

These techniques will be soon available in #SpeechBrain

https://twitter.com/mirco_ravanelli/status/1635678132731518976

175 viewsedited 21:35

Open / Comment

2023-03-14 23:00:11 Paraformer released models for other languages too:

We release several new UniASR model: Southern Fujian Dialect model, French model, German model, Vietnamese model, Persian model.

https://github.com/alibaba-damo-academy/FunASR

214 views20:00

Open / Comment

Speech Technology

Ratings & Reviews

The latest Messages 8

Popular Channels

Related Chats

Popular Channels

Login