2021-02-25 13:52:29
Ukrainian Open STT 1000 Hours
Following the path of Open STT in Russian, now you can enjoy a similar dataset in Ukrainian:
- Torrent Link
- GitHub Link
Congratulations to our Ukrainian friends for finally publishing a diverse easily downloadable dataset!
Their pages / dataset UX is still a bit rough on the edges, but compared how fast for example Common Voice accumulates data (130 hours for Russian and 43 hours for Ukrainian), UA Open STT and Open STT remain the best resource for respective languages to date.
Also unlike the majority of STT datasets which are (i) behind a paywall or sponsored by corporations (ii) have limited scope / domains (iii) fit some sort of agenda (i.e. use more GPUs than necessary, use our bloated tools, etc), this dataset is legit made by real people.
Also recently corporations have taken up the trend of rehashing publicly available data, which is cool, but unique data is still nowhere to be seen for obvious reasons (except for Common Voice, which is decent only for English).
#dataset
5.8K views10:52