Gemini Live

Bidirectional WebSocket streaming with the Gemini Live API: realtime text and audio in, text, PCM audio, transcriptions, and tool calls out.

SSE streaming is one-directional: a request goes up, chunks come down. The Live API is full duplex — Gemini::connect_live (feature live) opens a BidiGenerateContent WebSocket where you push text turns and realtime microphone audio while the model simultaneously streams back text, PCM audio, transcriptions, and tool calls. Server-side voice activity detection means the user can interrupt the model mid-sentence, surfaced to you as LiveEvent::Interrupted.

Enabling

Cargo.tomltoml
[dependencies]
adk-rs = { version = "0.6", features = ["live"] }  # implies "gemini"

The connection uses the same GeminiConfig as the HTTP client — base URL, API version, and key — and the same transport-security policy: wss:// always, plain ws:// only to loopback hosts (for local mocks). Use a Live-capable model such as gemini-2.5-flash-native-audio-preview for audio, or any flash model for text-mode sessions.

Connecting

Gemini::connect_live(&self, cfg: LiveConfig) -> Result<LiveSession>
Open the WebSocket, send the setup message, and await the server's setupComplete acknowledgement.
struct LiveConfig { response_modalities, system_instruction, tools, voice, input_audio_transcription, output_audio_transcription }
Session setup. response_modalities is ["TEXT"] by default (one modality per session — use ["AUDIO"] for speech). voice picks a prebuilt voice (e.g. "Kore", "Puck"). The transcription flags ask the server to transcribe input and output audio.

The session surface

LiveSession::send_text(&mut self, text: &str, turn_complete: bool) -> Result<()>
Send a text turn. With turn_complete: true the model starts responding immediately.
LiveSession::send_audio(&mut self, pcm: &[u8], mime_type: &str) -> Result<()>
Stream a chunk of realtime input audio (typically 16kHz 16-bit PCM, "audio/pcm;rate=16000"). Server-side VAD segments turns and may interrupt an in-flight response.
LiveSession::send_audio_stream_end(&mut self) -> Result<()>
Signal the input audio stream ended (e.g. microphone muted).
LiveSession::send_tool_response(&mut self, responses: Vec<FunctionResponse>) -> Result<()>
Answer a LiveEvent::ToolCall.
LiveSession::recv(&mut self) -> Result<Option<LiveEvent>>
Next event, or None once the server closes the session.
LiveSession::close(self) -> Result<()>
Close the WebSocket cleanly.

LiveEvent

EventMeaning
Text(String)Incremental model text.
Audio { data, mime_type }Model audio chunk, base64-decoded for you (typically 24kHz 16-bit PCM).
InputTranscription(String)Transcript of the user's audio (when requested).
OutputTranscription(String)Transcript of the model's audio (when requested).
ToolCall(Vec<FunctionCall>)Tool execution requested — answer with send_tool_response.
ToolCallCancellation(Vec<String>)Previously-issued calls cancelled by id (after an interruption).
InterruptedUser barge-in: stop local audio playback immediately.
GenerationCompleteThe model finished generating the current response.
TurnCompleteThe turn is over; the session is ready for new input.
GoAway { time_left }The server will close the connection soon — wind down or reconnect.
UsageMetadata(UsageMetadata)Token usage for the session so far.

End-to-end example

Voice session with transcriptionrust
use adk_rs::providers::gemini::{Gemini, LiveConfig, LiveEvent};

#[tokio::main]
async fn main() -> adk_rs::Result<()> {
    let gemini = Gemini::from_env("gemini-2.5-flash-native-audio-preview")?;
    let mut session = gemini
        .connect_live(LiveConfig {
            response_modalities: vec!["AUDIO".into()],
            voice: Some("Kore".into()),
            output_audio_transcription: true,
            ..LiveConfig::default()
        })
        .await?;

    session.send_text("Tell me a joke", true).await?;
    // or stream microphone input as it arrives:
    // session.send_audio(&pcm_chunk, "audio/pcm;rate=16000").await?;

    while let Some(event) = session.recv().await? {
        match event {
            LiveEvent::Audio { data, .. } => { /* queue PCM for playback */ }
            LiveEvent::OutputTranscription(t) => print!("{t}"),
            LiveEvent::Interrupted => { /* flush the playback queue */ }
            LiveEvent::TurnComplete => break,
            _ => {}
        }
    }
    session.close().await?;
    Ok(())
}

  • Providers — the Gemini HTTP client and shared configuration.
  • Streaming — one-directional SSE streaming through the runner.
  • Function tools — building the tools you answer ToolCall events with.