New Language Classifier For 116 Languages - 116 languages (83 | Data Science by ODS.ai 🦜
New Language Classifier For 116 Languages
- 116 languages (83% accuracy), 77 language groups (87% accuracy) - Mutually intelligible languages are united into language groups (i.e. Serbian + Croatian + Bosnian) - Trained on approx 20k hours of data (10k of which are for 5 most popular languages) - 1.7M params
Shortcomings
- Predictably, related and mutually intelligible languages are hard to tell apart - The confusion matrix mostly makes sense, except for low resource languages and English - English has the lowest accuracy - Dataset needs some further curation (i.e. remove hardly spoken or artificial languages) - Make a model larger
First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of f...