One gateway, one mind
Every team mounts under /api/<team> behind a single FastAPI server with an optional request-scanning security pre-scan. The whole roster is addressable — and collaborates — as one surface.
open-source · multi-agent · experimental
A personal open-source project for building agentic AI teams that actually work together. Today, twenty specialists behind one gateway. Tomorrow, an agentic system that decides what team a problem needs, spins one up, and keeps the agents that earn their keep.
Many teams. One mind. One objective. Yours.
§ 01 · What it is
Khala is an open-source multi-agent AI orchestration platform. It bundles
twenty specialist agent teams — spanning autonomous software development, product planning,
market research, content, compliance, sales, and personal assistance — under a single
FastAPI gateway at /api/<team>. Every team is a team-lead agent
that coordinates domain specialists over typed Pydantic contracts, with a pluggable LLM
backend (Ollama, Ollama Cloud, or Claude) and opt-in Temporal durable workflows for
crash-resumable execution.
But the wiring isn't the point. The point is that every team plugs into the same shared mind, so you can bring them in on whatever you're working on — turning a spec into shipped code, running a market discovery, drafting a launch, pairing on a portfolio, or decomposing a genuinely ambiguous problem with Deepthought's recursive sub-agent spawner.
Khala isn't a tool you point at a problem. It's a collaborator you think alongside — from discovery through ship.
Twenty specialist teams today — engineering, planning, research, content, compliance, sales, personal, more — addressable as one.
Designing new teams is the product. Describe one in plain English; Agentic Team Provisioning drafts the roster and the process with you.
The real project isn't the 20 teams. It's the system that makes agentic teams — and lets them operate as one mind.
§ 02 · Why it's interesting
Six properties that make Khala worth poking at, even if you already have a favorite orchestrator.
Every team mounts under /api/<team> behind a single FastAPI server with an optional request-scanning security pre-scan. The whole roster is addressable — and collaborates — as one surface.
Set TEMPORAL_ADDRESS and teams that export workflows switch from in-process threads to Temporal 1.24.2 durable executions that survive restarts. Don't set it and everything still works.
Unified client for Ollama Cloud, local Ollama, or Claude. Per-role overrides where it matters (planning, architecture, specialists).
4-phase pipeline — Discovery → Design → Execution → Integration — with parallel backend/frontend queues, a planning cache, per-task quality gates (lint, build, review, acceptance, security, QA, DbC, a11y), and a Repair Agent for crash recovery.
Every FastAPI service in the Docker stack is auto-instrumented. Prometheus + a provisioned Grafana dashboard ship in docker-compose.yml — no extra setup.
New teams aren't a plugin afterthought — they're the product. Add one by conversation or register it in TEAM_CONFIGS; it mounts at /api/<slug> on next restart.
§ 03 · The roster
Grouped for navigation, not for architecture — Core Dev, Business, Content, and Personal. The roster grows and prunes itself as we learn what's worth keeping.
Authoritative list: backend/unified_api/config.py · run GET /teams on a live instance for the live roster.
§ 04 · Architectural decisions
The interesting choices, not the obvious ones. Every decision below has a trade-off the project accepted on purpose.
SE team pipeline — one spec in, a deployable system out.
Every team mounts under /api/<slug> behind a single FastAPI app. The
security pre-scan lives at the gateway (SECURITY_GATEWAY_ENABLED=true) so
it's applied once, consistently, and teams don't each re-roll their own auth layer.
Trade-off we took: every team shares a process in local dev. Good for velocity, bad for strict blast-radius isolation — Docker mode gives each team its own container for that.
If TEMPORAL_ADDRESS is set, teams that export WORKFLOWS /
ACTIVITIES from <team>/temporal/__init__.py switch from threads
to Temporal 1.24.2 workflows — progress survives server restarts. If it's not set,
everything still runs as background threads.
Trade-off we took: two execution paths to maintain. In exchange, local dev stays zero-dependency and production gets crash-resumable pipelines.
Teams export a SCHEMA: TeamSchema constant (pure data, no side effects); each
team's lifespan calls register_team_schemas(SCHEMA) at startup. No
POSTGRES_HOST? Registration no-ops. Teams get transactional storage without
a per-team migration runner.
Trade-off we took: no SQLite fallback for migrated teams — they require Postgres. Docker compose brings one up; local dev needs the env vars. We picked consistency over "runs anywhere."
Every team has a team-lead agent that coordinates specialists through typed request/response models. You get runtime validation, free OpenAPI docs, and a contract the agents can't silently break.
Trade-off we took: contracts are heavier to evolve than loose dict-passing. We picked the compile-time safety net over the freeform flexibility.
A single llm_service wraps Ollama (local or Cloud) and Claude behind one API,
with generate_structured doing Pydantic validation + one schema-grounded
self-correction retry. Per-role overrides (e.g. ARCHITECT_MODEL_SPECIALIST,
BLOG_PLANNING_MODEL) let specific agents pick their own model without
forking a client.
Trade-off we took: model-specific features lose fidelity behind the abstraction. We keep the escape hatch open: teams that need a raw provider call still have it.
The Agent Console reads per-agent YAML manifests from
backend/agents/<team>/agent_console/manifests/*.yaml. Each describes id,
team, summary, I/O schema refs, invoke metadata, and sandbox provisioning hints — and they
drive the UI catalog plus the /api/agents endpoint automatically.
Trade-off we took: every new specialist gets a YAML companion. We took the authoring cost to remove the "how do I even discover this agent?" problem.
The Agent Console Runner invokes a single specialist in a warm Docker sandbox per team
(sandbox.compose.yml, dedicated sandbox-postgres, isolated network,
ports 8200–8220). Sandboxes are reused across invocations and reaped after
SANDBOX_IDLE_TEARDOWN_MINUTES (default 15).
Trade-off we took: cold-start latency on first invocation per team. In exchange, agents tagged requires-live-integration stay catalogued but clearly marked unrunnable — no silent breakage.
Every containerized team mounts the same agents_data named volume at
/data/agents and sets AGENT_CACHE=/data/agents. Teams
self-namespace under {team_name}/. Job state, caches, profiles, workspaces —
all persist across restarts without per-team volume plumbing.
Trade-off we took: teams share a filesystem root. We trust namespacing over isolation at this layer; the stronger isolation story lives at the Temporal/Postgres layer.
Full diagrams (SDLC phases, task graphs, worker pipelines, DevOps gates):
ARCHITECTURE.md
§ 05 · What's shipping now
Pulled from CHANGELOG.md. This is active research — shapes and surfaces move.
The Runner tab now invokes single specialist agents in warm per-team Docker sandboxes.
Dedicated sandbox-postgres, isolated khala-sandbox network, idle
reaping after 15 min. Four teams wired day one (blogging, software_engineering,
planning_v3, branding); others join as their APIs mount the one-line invoke shim.
New /agent-console page replaces the old provisioning form. Browsable,
searchable catalog of every specialist, team/tag filters, detail drawer — backed by
declarative YAML manifests. /agent-provisioning redirects here; provisioning
and environments live verbatim in the third tab.
llm_service
New generate_text / generate_structured entrypoints layer Pydantic
validation + one schema-grounded self-correction retry on top of complete_json.
A CI static check blocks Markdown-body prompts from JSON-only methods.
BlogReviewAgent removed. The pipeline is now research → planning → writer
with a persisted ContentPlan. Planning failure returns HTTP 422 with a
specific reason instead of a muddled success.
§ 05.5 · Frequently asked
Short answers to the things that come up in every first conversation about Khala.
Khala is an open-source multi-agent AI orchestration platform. It mounts twenty specialist agent teams — each a team-lead agent coordinating role-separated specialists over typed Pydantic contracts — behind a single FastAPI gateway. Think of it as a runtime for agentic teams: you don't point it at a problem; you work with it the way you'd work with a real cross-functional team.
No — Khala is experimental, active research. Outputs can be incomplete, inconsistent, or wrong. APIs change without notice. Run it in isolated environments, keep humans in the loop on anything that matters, and treat every generated artifact (code, audits, trades, compliance reports) as a draft that needs review. If you're looking for a hardened platform with SLAs, this isn't it yet.
Out of the box: Ollama (local inference or Ollama Cloud), and
Claude via direct API calls. Configure with LLM_PROVIDER,
LLM_BASE_URL, and LLM_MODEL. Individual teams can override
per role (e.g. ARCHITECT_MODEL_SPECIALIST, BLOG_PLANNING_MODEL)
so planning, architecture, and domain specialists can each pick their own model without
forking the client.
Most agent frameworks give you primitives — tools, memories, chains — and leave team design to you. Khala ships a complete roster of opinionated specialist teams (software engineering, planning, research, blogging, compliance, and more) with production-style wiring: a unified HTTP gateway, Pydantic contracts, a shared Postgres schema registry, Temporal-backed durability, a warm Docker sandbox runner, and Prometheus + Grafana observability out of the box. And the meta-layer — designing new teams by conversation — is a first-class feature, not a plugin.
Yes. Local dev runs the Unified API as a single FastAPI process with every enabled team's
router mounted in-process on port 8080, and agents run as Python threads. You still need
Postgres for the migrated teams (blogging, branding, startup_advisor, user_agent_founder,
agentic_team_provisioning, nutrition, team_assistant, and unified_api credentials) — the
easiest way is to start just Postgres from the Docker compose file and export the
POSTGRES_* env vars.
Two ways. Conversationally: describe the roster you want in plain English
to the Agentic Team Provisioning team — it drafts agents, roles, and process, validates
staffing, and can bridge to Agent Provisioning to stand up the environment.
By hand: follow AGENT_ANATOMY.md (I/O, tools, memory,
prompts, guardrails, sub-agents), register the team in backend/unified_api/config.py
(TEAM_CONFIGS), and it mounts at /api/<your-slug> on next restart.
Named after the Protoss unifying religion from StarCraft — a psionic link joining many minds into one. The metaphor is deliberate: every specialist agent in Khala shares the same gateway, the same artifact cache, the same observability plane. Many teams. One mind. One objective. Yours.
Khala is experimental. The agents here are active research, not a production product. Outputs can be incomplete, inconsistent, or just plain wrong. APIs change without notice. A team that shipped a feature yesterday may hit a wall today. Run it in isolated environments, keep humans in the loop on anything that matters, and treat every generated artifact as a draft that needs review.
Looking for a hardened platform with SLAs? This isn't it — yet. Looking to build, tinker, and push the frontier of multi-agent systems? Welcome aboard.
§ 06 · Get it running
Brings up Postgres, Temporal + UI, a per-team microservice for every enabled team, the Unified API proxy, the Angular UI, Prometheus, and Grafana.
cp docker/.env.example docker/.env # set OLLAMA_API_KEY
./docker/ensure-network.sh # one-time
docker compose -f docker/docker-compose.yml \
--env-file docker/.env up --build
localhost:4201localhost:8888/docslocalhost:8080localhost:3000The Unified API as a single FastAPI process with every team's router mounted in-process. Postgres required for migrated teams — start one from Docker.
# terminal 1 — backend
cd backend
make install
python run_unified_api.py
# → http://localhost:8080/docs
# terminal 2 — frontend
cd user-interface
nvm use && npm ci && npm start
# → http://localhost:4200
Describe the roster you want in plain English. Agentic Team Provisioning drafts the agents, roles, and process, validates staffing, and (optionally) bridges to Agent Provisioning to stand up the environment.
Follow AGENT_ANATOMY.md
(I/O, tools, memory, prompts, guardrails, sub-agents), register in
TEAM_CONFIGS, and it mounts at /api/<your-slug> on next restart.