Gemini Live

Bidirectional WebSocket streaming with the Gemini Live API: realtime text and audio in, text, PCM audio, transcriptions, and tool calls out.

SSE streaming is one-directional: a request goes up, chunks come down. The Live API is full duplex — Gemini::connect_live (feature live) opens a BidiGenerateContent WebSocket where you push text turns and realtime microphone audio while the model simultaneously streams back text, PCM audio, transcriptions, and tool calls. Server-side voice activity detection means the user can interrupt the model mid-sentence, surfaced to you as LiveEvent::Interrupted.

Enabling

Cargo.tomltoml

[dependencies]
adk-rs = { version = "0.6", features = ["live"] }  # implies "gemini"

The connection uses the same GeminiConfig as the HTTP client — base URL, API version, and key — and the same transport-security policy: wss:// always, plain ws:// only to loopback hosts (for local mocks). Use a Live-capable model such as gemini-2.5-flash-native-audio-preview for audio, or any flash model for text-mode sessions.

Connecting

Gemini::connect_live(&self, cfg: LiveConfig) -> Result<LiveSession>: Open the WebSocket, send the setup message, and await the server's setupComplete acknowledgement.
struct LiveConfig { response_modalities, system_instruction, tools, voice, input_audio_transcription, output_audio_transcription }: Session setup. response_modalities is ["TEXT"] by default (one modality per session — use ["AUDIO"] for speech). voice picks a prebuilt voice (e.g. "Kore", "Puck"). The transcription flags ask the server to transcribe input and output audio.

The session surface

LiveSession::send_text(&mut self, text: &str, turn_complete: bool) -> Result<()>: Send a text turn. With turn_complete: true the model starts responding immediately.
LiveSession::send_audio(&mut self, pcm: &[u8], mime_type: &str) -> Result<()>: Stream a chunk of realtime input audio (typically 16kHz 16-bit PCM, "audio/pcm;rate=16000"). Server-side VAD segments turns and may interrupt an in-flight response.
LiveSession::send_audio_stream_end(&mut self) -> Result<()>: Signal the input audio stream ended (e.g. microphone muted).
LiveSession::send_tool_response(&mut self, responses: Vec<FunctionResponse>) -> Result<()>: Answer a LiveEvent::ToolCall.
LiveSession::recv(&mut self) -> Result<Option<LiveEvent>>: Next event, or None once the server closes the session.
LiveSession::close(self) -> Result<()>: Close the WebSocket cleanly.

LiveEvent

Event	Meaning
`Text(String)`	Incremental model text.
`Audio { data, mime_type }`	Model audio chunk, base64-decoded for you (typically 24kHz 16-bit PCM).
`InputTranscription(String)`	Transcript of the user's audio (when requested).
`OutputTranscription(String)`	Transcript of the model's audio (when requested).
`ToolCall(Vec<FunctionCall>)`	Tool execution requested — answer with `send_tool_response`.
`ToolCallCancellation(Vec<String>)`	Previously-issued calls cancelled by id (after an interruption).
`Interrupted`	User barge-in: stop local audio playback immediately.
`GenerationComplete`	The model finished generating the current response.
`TurnComplete`	The turn is over; the session is ready for new input.
`GoAway { time_left }`	The server will close the connection soon — wind down or reconnect.
`UsageMetadata(UsageMetadata)`	Token usage for the session so far.

End-to-end example

Voice session with transcriptionrust

use adk_rs::providers::gemini::{Gemini, LiveConfig, LiveEvent};

#[tokio::main]
async fn main() -> adk_rs::Result<()> {
    let gemini = Gemini::from_env("gemini-2.5-flash-native-audio-preview")?;
    let mut session = gemini
        .connect_live(LiveConfig {
            response_modalities: vec!["AUDIO".into()],
            voice: Some("Kore".into()),
            output_audio_transcription: true,
            ..LiveConfig::default()
        })
        .await?;

    session.send_text("Tell me a joke", true).await?;
    // or stream microphone input as it arrives:
    // session.send_audio(&pcm_chunk, "audio/pcm;rate=16000").await?;

    while let Some(event) = session.recv().await? {
        match event {
            LiveEvent::Audio { data, .. } => { /* queue PCM for playback */ }
            LiveEvent::OutputTranscription(t) => print!("{t}"),
            LiveEvent::Interrupted => { /* flush the playback queue */ }
            LiveEvent::TurnComplete => break,
            _ => {}
        }
    }
    session.close().await?;
    Ok(())
}

Providers — the Gemini HTTP client and shared configuration.
Streaming — one-directional SSE streaming through the runner.
Function tools — building the tools you answer ToolCall events with.

Gemini Live

§Enabling

§Connecting

§The session surface

§LiveEvent

§End-to-end example

Enabling

Connecting

The session surface

LiveEvent

End-to-end example