We also tried to create the Ukrainian voice, but the data we had (sourced from audiobooks) was not very good (all other voices were created from recordings).
Some models sound almost perfect, some a bit worse. Typically this boils down to how speakers can provide steady consistent recordings.
We used anywhere from 1 hour to 6 hours of recordings to create each voice.
These models obviously do not include automated stress and have the same major caveats as other v2 models (i.e. best used with batch size 1 on 2-4 CPU threads).
First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of f...