
The Speech synthesis is based on neural networks with implemented German (incl. 8 different emotions) and English. The models can further be trained to speak with someone else’s voice (ca. 30 min of voice recordings for medium and 60 min for good quality needed).