Get Mystery Box with random crypto!

Speech Technology

Logo of telegram channel speechtech — Speech Technology S
Logo of telegram channel speechtech — Speech Technology
Channel address: @speechtech
Categories: Technologies
Language: English
Subscribers: 652

Ratings & Reviews

2.67

3 reviews

Reviews can be left only by registered users. All reviews are moderated by admins.

5 stars

1

4 stars

0

3 stars

0

2 stars

1

1 stars

1


The latest Messages 7

2023-04-18 00:44:44 Not sure about claimed accuracy but numbers are interesting

https://blog.deepgram.com/nova-speech-to-text-whisper-api/


A remarkable 22% reduction in word error rate (WER)

A blazing-fast 23-78x quicker inference time

A budget-friendly 3-7x lower cost starting at only $0.0043/min
450 viewsedited  21:44
Open / Comment
2023-04-18 00:30:35 Laugh is nice, Russian stress is traditionally bad

https://github.com/suno-ai/bark
394 viewsedited  21:30
Open / Comment
2023-04-12 16:20:21
Space is closer than you think. Happy Cosmonautics day my friends.
279 views13:20
Open / Comment
2023-04-10 10:54:51 GPU beam search in pytorch

https://github.com/pytorch/audio/pull/3096
312 views07:54
Open / Comment
2023-04-08 15:59:16 NeMo 1.17 is now released and and includes a lot of improvements that users have long requested.

This includes a high level Diarization API, PyCTCDecode support for beam search, InterCTC Loss support, AWS Sagemaker tutorial and more !

https://twitter.com/alphacep/status/1644685634404073472
404 views12:59
Open / Comment
2023-04-04 04:44:26

41 views01:44
Open / Comment
2023-04-04 01:12:27 Learning model from Whisper

https://github.com/speechcatcher-asr
131 viewsedited  22:12
Open / Comment
2023-04-03 04:52:29 https://groups.inf.ed.ac.uk/edacc/

The Edinburgh International Accents of English Corpus: Towards the Democratization of English ASR. Ramon Sanabria, Bogoychev, Markl, Carmantini, Klejch, and Bell. ICASSP 2023. Presentation of the EdAcc.
276 views01:52
Open / Comment
2023-04-02 16:45:37 https://www.openslr.org/136/

EMNS
Identifier: SLR136

Summary: An emotive single-speaker dataset for narrative storytelling. EMNS is dataset containing transcriptions, emotion, emotion intensity, and description of acted speech.

Category: Speech, text-to-speech, automatic speech recognition

License: Apache 2.0
About this resource:

Emotive Narrative Storytelling (EMNS) corpus introduces a dataset consisting of a single speaker, British English speech with high-quality labelled utterances tailored to drive interactive experiences with dynamic and expressive language. Each audio-text pairs are reviewed for artefacts and quality. Furthermore, we extract critical features using natural language descriptions, including word emphasis, level of expressiveness and emotion.

EMNS data collection tool: https://github.com/knoriy/EMNS-DCT

EMNS cleaner: https://github.com/knoriy/EMNS-cleaner
331 viewsedited  13:45
Open / Comment
2023-04-02 16:36:54 The largest 2,000 hours multi-layer annotated corpus QASR is available @ https://arabicspeech.org/qasr/ QASR is suitable for ASR, dialect ID, punctuation, speaker ID-linking, and potentially other NLP modules for spoken data.
#nlproc #speechproc #Arabic #AI
@QatarComputing

@qcrialt

https://twitter.com/ArabicSpeech/status/1641402805951815681
312 views13:36
Open / Comment