Kian and Ajay, the founders of Vocode, are shaking up the world of voice AI with their open-source library designed to transform text-based large language models (LLMs) into voice-based ones. With Vocode, creating real-time voice applications becomes more accessible and powerful than ever before.
The challenge of building real-time voice apps with LLMs lies in the orchestration of speech recognition, LLMs, and speech synthesis, all while managing the complexity of conversation, such as understanding when a person is done speaking or handling interruptions. Vocode makes this process easier and more efficient, allowing developers to set up a conversation in less than 15 lines of code.
Witness the power of Vocode in action with their Gen Z GPT hotline demo: https://replit.com/@vocode/Gen-Z-Phone (give it a try at +1-650-729-9536).
It all began with Kian and Ajay’s PrankGPT project (https://www.loom.com/share/0d0d68f1a62f409eb5ae24521293d2dc), which demonstrated the potential of combining voice and LLMs, but also highlighted the difficulties in doing so. After overcoming the challenges, they were amazed by the utility and coolness of talking to LLMs, which surpassed other voice AI experiences they had encountered. This inspired them to create a developer tool to simplify the process for others.
Vocode’s open-source library offers a one-stop-shop for everything developers need. It features out-of-the-box integrations with various speech recognition and synthesis providers, with the flexibility to swap them as needed. Vocode supports web and telephony platforms (via Twilio) and plans to expand to mobile soon. The library also includes abstractions for streaming conversations and command-based applications, along with customization options for conversation dynamics such as emotion, filler audio, and more.
As for monetization, Kian and Ajay plan to charge for their hosted version (currently free at https://app.vocode.dev) and develop enterprise products in the future.
"prompt": "Vocode Revolutionizing Voice AI Experiences, deepleaps.com, best quality, 4k, 8k, ultra highres, raw photo in hdr, sharp focus, intricate texture, skin imperfections, photograph of",
"negative_prompt": "worst quality, low quality, normal quality, child, painting, drawing, sketch, cartoon, anime, render, 3d, blurry, deformed, disfigured, morbid, mutated, bad anatomy, bad art",
"original_prompt": "Vocode Revolutionizing Voice AI Experiences, deepleaps.com, best quality, 4k, 8k, ultra highres, raw photo in hdr, sharp focus, intricate texture, skin imperfections, photograph of",