Get Mystery Box with random crypto!

Speech Technology

Logo of telegram channel speechtech — Speech Technology S
Logo of telegram channel speechtech — Speech Technology
Channel address: @speechtech
Categories: Technologies
Language: English
Subscribers: 652

Ratings & Reviews

2.67

3 reviews

Reviews can be left only by registered users. All reviews are moderated by admins.

5 stars

1

4 stars

0

3 stars

0

2 stars

1

1 stars

1


The latest Messages 8

2023-04-01 03:37:39 This is interesting, all open source conformer implementations have bugs:

We have just released open source a bug-free implementation of the Conformer model.
Check it at: https://github.com/hlt-mt/FBK-fairseq/blob/master/fbk_works/BUGFREE_CONFORMER.md
Want to discover what "bug-free" means?
Take a look at our paper: https://arxiv.org/pdf/2303.16166.pdf

#opensource #conformer #speech #bug #bugfree #NLProc

https://twitter.com/sarapapi/status/1641750885524029440
54 views00:37
Open / Comment
2023-04-01 03:24:17
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research

Propose a three-stage processing pipeline for filtering noisy data and generating high-quality captions, where ChatGPT.

Конвейер обработки для фильтрации зашумленных данных и создания высококачественных титров
.

Github: https://github.com/xinhaomei/wavcaps

Paper: https://arxiv.org/abs/2303.17395v1

Dataset: https://paperswithcode.com/dataset/sounddescs

ai_machinelearning_big_data
32 views00:24
Open / Comment
2023-03-30 12:57:36 12th ISCA Speech Synthesis Workshop (SSW) is now open for submissions!
Final submission deadline: May, 3 2023
Late breaking reports submission deadline : June, 28 2023

The Speech Synthesis Workshop will be held in Grenoble, France and is organized as a satellite event of the Interspeech conference in Dublin, Ireland
Come and join the SSW community and the people who creates machines that talk!

Visit the official site for more information
https://ssw2023.org/
322 views09:57
Open / Comment
2023-03-27 21:00:59 : We've just released our GitHub repository for #ASR and #NLP tools for air traffic control communications, based on ATCO2 dataset
@Atco2P
!

We made public 5000+ hours of audio --> research on ASR for ATC.

GitHub: https://github.com/idiap/atco2-corpus

https://twitter.com/Pablogomez3/status/1640331512389279744
242 views18:00
Open / Comment
2023-03-26 00:23:33 The amount of models this guy trained is quite outstanding

https://malaya-speech.readthedocs.io/en/latest/index.html
387 views21:23
Open / Comment
2023-03-16 11:23:18
Streaming punctuation model is interesting

https://github.com/alibaba-damo-academy/FunASR/releases/tag/v0.3.0
279 views08:23
Open / Comment
2023-03-16 02:49:47 Kincaid46 WER from Ursa announcement:

AssemblyAI: 8.6
Speechmatics: 7.88
Microsoft: 9.70
Whisper Large-v2: 8.7
Vosk 0.42 Gigaspeech 15.8
Google 12.52
Amazon 10.94
293 viewsedited  23:49
Open / Comment
2023-03-16 01:55:28 New model from Assembly AI. Definitely improved from before, but not as great as Speechmatics.

On a toy test WER 10.89, previous assemblyAI (version 9) was at 11.04, version before 11.89. Speechmatics 6.88. Whisper large 8.94

https://twitter.com/AssemblyAI/status/1636050346240884744

Introducing Conformer-1: our latest state-of-the-art speech recognition model.

Built on top of the Conformer architecture and trained on 650K hours of audio data, it achieves near-human-level performance, making up to 43% fewer errors on noisy data than other ASR models.

We use a modified version of the conformer neural net published by Google Brain.

It's built on top of an Efficient Conformer (Orange Labs, 2021), that introduces the following technical modifications:

- Progressive Downsampling to reduce the length of the encoded sequence
- Grouped Attention: A modified version of the attention mechanism that makes it agnostic to sequence-length

These changes yield speedups of 29% at inference time and 36% at training time.

To further improve our model’s accuracy on noisy audio, we implemented a modified version of Sparse Attention, a pruning method for achieving sparsity of the model’s weights in order to achieve regularization.

We took inspiration from the data scaling laws described in DeepMind's Chinchilla paper and adapted them to the ASR domain.

Our team curated a dataset of 650K hours of English audio - making our model the largest-trained supervised model for English available today.

Based on our results, Conformer-1 is more robust on real-world data than popular commercial and open-source ASR models, making up to 43% fewer errors on average on noisy data:

The biggest improvement with this new release is in our robustness to a wide variety of data domains and noisy audio.
308 views22:55
Open / Comment
2023-03-15 00:35:52 How can we make inference faster when using big #speech #selfsupervised models?

Check out @salah_zaiem 's paper that compares various approaches, revealing some pretty interesting insights.

https://arxiv.org/abs/2303.06740

These techniques will be soon available in #SpeechBrain

https://twitter.com/mirco_ravanelli/status/1635678132731518976
175 viewsedited  21:35
Open / Comment
2023-03-14 23:00:11 Paraformer released models for other languages too:

We release several new UniASR model: Southern Fujian Dialect model, French model, German model, Vietnamese model, Persian model.

https://github.com/alibaba-damo-academy/FunASR
214 views20:00
Open / Comment