EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-Judge
Researchers have developed EmergentTTS-Eval, a new automated benchmark for evaluating TTS systems on complex and nuanced text scenarios, including emotions, foreign words, and complex pronunciations, by generating diverse test cases with LLMs. Using a Large Audio Language Model as a judge, the framework assesses multiple speech quality dimensions, revealing fine-grained performance differences among state-of-the-art TTS models and correlating well with human preferences.