Get Mystery Box with random crypto!

New Language Classifier For 116 Languages - 116 languages (83 | Data Science by ODS.ai 🦜

New Language Classifier For 116 Languages

- 116 languages (83% accuracy), 77 language groups (87% accuracy)
- Mutually intelligible languages are united into language groups (i.e. Serbian + Croatian + Bosnian)
- Trained on approx 20k hours of data (10k of which are for 5 most popular languages)
- 1.7M params

Shortcomings

- Predictably, related and mutually intelligible languages are hard to tell apart
- The confusion matrix mostly makes sense, except for low resource languages and English
- English has the lowest accuracy
- Dataset needs some further curation (i.e. remove hardly spoken or artificial languages)
- Make a model larger

Link

- https://github.com/snakers4/silero-vad