Voice — Kokoro (instant TTS)
12 preset voices, instant. Type and download.
— no output yet —
Voice — F5 clone (your voice, any text)
Upload 5–10s of any voice + the text it says + new text. Get the new text in that voice.
— no output yet —
Vision — image to text (OCR / describe)
Upload any image. Ask any question about it.
— no output yet —
Transcribe — audio to text (Whisper)
Upload any audio file. Get a transcript.
— no output yet —
Music — text to song (Stable Audio Open)
Describe what you want to hear. Returns a 30s stereo clip. Takes ~80s on tunnel.
— no output yet —
Avatar — lip-sync (Wav2Lip)
Upload a face photo + audio. Get a video of the face speaking that audio.
— no output yet —
Video — text to video (LTX-Video)
⚠️ Slow + heavy. First call stops the brain to load LTX (~10 min cold). Use sparingly.
— no output yet —