
Registered since September 28th, 2017
Has a total of 4281 bookmarks.
Showing top Tags within 1 bookmarks
howto information development guide reference administration design website software solution online service product business uk tool company linux code server application system web list video marine create data experience tutorial description explanation learn technology build article blog world project boat download windows lookup security free performance javascript technical london beautiful control network tools support course file research purchase image library programming youtube example php construction install opensource community html quality computer feature profile power browser music platform process mobile work user share manage professional database hardware buy industry advice internet dance developer installation search 3d camera customer access travel material standard money test develop documentation review css engineering photography webdesign engine device digital speed event api source management question program client phone discussion content simple story water marketing yacht app account setup interface package idea fast communication compare cheap script market study easy live google resource operation demonstration contact startup
Tag selected: tts.
Looking up tts tag. Showing 1 results. Clear
Saved by uncleflo on April 25th, 2026.
Speak to an AI using our low-latency open-source speech-to-text and text-to-speech. This is a cascaded system made by Kyutai: our speech-to-text transcribes what you say, an LLM (we use GPT OSS 120B) generates the text of the response, and we then use our text-to-speech model to say it out loud. All of the components are open-source: Kyutai STT, Kyutai TTS 1.6B, and Unmute itself. Although cascaded systems lose valuable information like emotion, irony, etc., they provide unmatched modularity: since the three parts are separate, you can Unmute any LLM you want without any finetuning or adaptation! In this demo, you can get a feel for this versatility by tuning the system prompt of the LLM to handcraft the personality of your digital interlocutor, and independently changing the voice of the TTS. Both the speech-to-text and text-to-speech models are optimized for low latency. The STT model is streaming and integrates semantic voice activity detection instead of relying on an external model. The TTS is streaming both in audio and in text, meaning it can start speaking before the entire LLM response is generated. You can use a 10-second voice sample to determine the TTS's voice and intonation. Check out the pre-print for details.
opensource speech text convert system automate ai transcribe llm generate response voice tts translate website online tool useful good
No further bookmarks found.