Soul App Launches Revolutionary Full-Duplex Voice Model
As AI deeply integrates into
human life and reshapes connectivity, what foundational capabilities are needed
to enhance interactive experiences in social scenarios?
Recently, Soul App has upgraded its self-developed, end-to-end
full-duplex voice call model. Redefining "full-duplex", the new model
abandons traditional concepts like "VAD (Voice Activity Detection,
commonly used to detect speech start/end)" and "latency",
breaking away from the industry-standard "turn-by-turn" interaction
pattern.
Instead, it empowers AI to autonomously decide speaking timing,
such as proactively breaking silence, appropriately interrupting users,
listening while speaking, perceiving time semantics, enabling parallel
discussions, and more. The model also supports multi-dimensional perception
(including time, environment, and event awareness) and natural speech features
(e.g., fillers, stammering, noticeable emotional fluctuations), making AI
interactions more "human-like" and delivering an immersive, lifelike
voice experience.
The upgraded full-duplex model
will soon enter beta testing on Soul and will later be deployed in one-on-one
interactive scenarios like digital human calls and AI matchmaking. Soul's team
is also exploring its application in group settings, enabling AI to join
conversations at the right moment, extend topics, and foster diverse
relationship networks.
Tao Ming, CTO of Soul App,
stated, "Social interaction is an exchange of emotional and informational
value. Soul remains committed to leveraging innovative technology and product
solutions to deliver smarter, more immersive, and higher-quality interactive
experiences, making loneliness go away for all."
Previously constrained by technical limitations, human-AI
dialogues often followed a "Q&A" format (user asks → AI
responds), where latency or interruptions disrupted immersion.
In 2024, Soul launched its self-developed, end-to-end full-duplex
voice model, featuring ultra-low latency, rapid auto-interruption,
hyper-realistic voice expression, and emotional perception. It could directly
interpret complex auditory inputs and support highly anthropomorphic
multilingual styles. To further achieve daily-life-like conversations and
"human-like" companionship, Soul has now upgraded the model with the
following capabilities:
Leave A Comment