Guide: Production deployment
Take an adk-rs agent to production with a lean feature set, JSON telemetry with OTLP export, an authenticated HTTP server, a hardened container image, and runtime cost controls.
An adk-rs agent ships as one static binary, which makes deployment refreshingly boring: pick the features you actually use, turn on structured telemetry, serve behind a bearer token, and wrap it in a small container. This guide walks that path and points out the security guards you will meet along the way.
1. Choose a lean feature set
Default features are empty and full exists for development — production builds should name exactly what they need so heavy dependencies (sqlx, axum, OpenTelemetry) only compile when used. See Installation for the full matrix. A typical serving stack:
[dependencies]
adk-rs = { version = "0.6", features = [
"gemini", # your provider(s)
"sqlite", # durable sessions
"fs", # filesystem artifacts
"server", # axum HTTP server
"otel", # OTLP trace export (implies "telemetry")
] }
tokio = { version = "1", features = ["macros", "rt-multi-thread"] }For smaller, faster binaries, set lto = "thin", codegen-units = 1, and strip = "symbols" in your [profile.release].
2. Initialize telemetry
Call adk_rs::telemetry::init once at process start. It installs a tracing subscriber with an env-filter (a set RUST_LOG overrides the configured filter) and, when the otel feature is on and an endpoint is provided, a batching OTLP HTTP span exporter. The call is idempotent.
use adk_rs::telemetry::{self, LogFormat, TelemetryConfig};
telemetry::init(TelemetryConfig {
filter: Some("info,adk_rs=debug".into()),
format: LogFormat::Json, // newline-delimited JSON for log aggregators
otlp_endpoint: std::env::var("OTLP_ENDPOINT").ok(),
service_name: Some("research-agent".into()),
})?;3. Build the binary: CLI scaffold or direct server
The quickest path is adk_rs::cli::App (feature cli, which transitively enables telemetry, server, and eval): register your agents and you get run, web, eval, and version subcommands, with telemetry initialized from --log/--log-format.
use std::sync::Arc;
fn main() -> adk_rs::Result<()> {
adk_rs::cli::App::new("research")
.register("pipeline", Arc::new(build_pipeline()?))
.run()
}./research web --bind 0.0.0.0:8000 \
--auth-token "$ADK_WEB_TOKEN" \
--allow-origins https://app.example.comuse adk_rs::server::{self, AppState};
use std::collections::HashMap;
use std::sync::Arc;
#[tokio::main]
async fn main() -> adk_rs::Result<()> {
let runner = Arc::new(build_runner().await?); // your durable Runner
let mut runners = HashMap::new();
runners.insert("pipeline".to_string(), runner);
let token = std::env::var("ADK_WEB_TOKEN").expect("ADK_WEB_TOKEN must be set");
let state = AppState::with_bearer_token(Arc::new(runners), token)
.with_allow_origins(["https://app.example.com".to_string()]);
server::serve("0.0.0.0:8000".parse().unwrap(), state).await
}4. The non-loopback guard, satisfied properly
Binding anything other than 127.0.0.1/::1 without authentication makes serve return a config error instead of starting — otherwise anyone reachable on the network could drive your agents and read every session. Two legitimate ways forward, one escape hatch:
- Set a bearer token —
AppState::with_bearer_token(runners, token)(or--auth-token/ADK_WEB_TOKENon the CLI). This is the right answer for direct exposure. - Stay on loopback and put your own authenticating reverse proxy (nginx, Envoy, a cloud LB) in front. The guard never triggers for
127.0.0.1. ServeOptions::dangerously_allow_unauthenticated_remote(CLI:--dangerously-allow-unauthenticated-remote) disables the guard. Reserve it for isolated networks you fully control — the server still logs a loud warning.
5. A hardened Dockerfile
adk-rs uses rustls throughout (no OpenSSL to install), so the runtime stage can be a distroless image — CA certificates included, no shell, non-root user. Inside the container the server must bind 0.0.0.0 to be reachable, which is exactly why the bearer token from step 4 is non-negotiable. Mount a volume for the SQLite file (e.g. sqlite:///data/adk.db?mode=rwc) and the artifact root, and inject secrets (GOOGLE_API_KEY, ADK_WEB_TOKEN, OTLP_ENDPOINT) from your orchestrator’s secret store — never bake them into the image.
FROM rust:1.85-bookworm AS build
WORKDIR /src
COPY . .
RUN cargo build --release --locked
FROM gcr.io/distroless/cc-debian12:nonroot
COPY --from=build /src/target/release/research /usr/local/bin/research
ENTRYPOINT ["/usr/local/bin/research"]6. Operational budgets: calls, caching, compaction
Three knobs keep long-lived deployments fast and affordable. Cap the per-invocation LLM-call budget with RunConfig so a runaway tool loop fails fast instead of burning quota; enable provider-side context caching and automatic event compaction on the runner:
use adk_rs::core::{ContextCacheConfig, RunConfig};
use adk_rs::genai_types::Content;
use adk_rs::runner::{EventsCompactionConfig, Runner};
let runner = Runner::builder()
.app_name("research")
.agent(agent)
.session_service(sessions)
.context_cache_config(ContextCacheConfig::default())
.compaction(EventsCompactionConfig::new(summarizer_model.clone()))
.build()?;
let cfg = RunConfig {
max_llm_calls: Some(25), // hard per-invocation budget
..RunConfig::default()
};
let mut events = runner
.run_with("alice", Some("alice-main"), Content::user_text(prompt), cfg)
.await?;- Context caching — pair
context_cache_configwithLlmAgent::builder().static_instruction(...)so the system prefix stays byte-identical across turns; defaults are a 30-minute TTL refreshed every 10 calls. - Event compaction —
EventsCompactionConfig::new(model)summarizes older events with an LLM after invocations complete (default: every 5 invocations, 2 events of overlap; tune with.compaction_interval(n)and.overlap_size(n)), so multi-week sessions stop growing the prompt without bound. - Cancellation and resume —
runner.start(...)returns aRunningInvocationhandle whose id you cancancelfrom an admin endpoint.