How do I add a new agent team to Khala?

Two ways. Conversationally: describe the roster you want in plain English to the Agentic Team Provisioning team — it drafts agents, roles, and process. By hand: follow AGENT_ANATOMY.md, register the team in backend/unified_api/config.py (TEAM_CONFIGS), and it mounts at /api/ on next restart.

Khala — Open-source Multi-Agent AI Orchestration Platform

Q: Is Khala production-ready?

No — Khala is experimental, active research. Outputs can be incomplete, inconsistent, or wrong. APIs change without notice. Run it in isolated environments, keep humans in the loop on anything that matters, and treat every generated artifact as a draft that needs review.

Q: What LLMs does Khala support?

Out of the box: Ollama (local inference or Ollama Cloud), and Claude via direct API calls. Configure with LLM_PROVIDER, LLM_BASE_URL, and LLM_MODEL. Individual teams can override per role so planning, architecture, and domain specialists can each pick their own model.

Q: How is Khala different from LangChain, CrewAI, or AutoGen?

Most agent frameworks give you primitives and leave team design to you. Khala ships a complete roster of opinionated specialist teams with production-style wiring: a unified HTTP gateway, Pydantic contracts, a shared Postgres schema registry, Temporal-backed durability, a warm Docker sandbox runner, and Prometheus + Grafana observability. Designing new teams by conversation is a first-class feature, not a plugin.

Q: What does the name Khala mean?

Khala is named after the Protoss unifying religion from StarCraft — a psionic link joining many minds into one. The metaphor is deliberate: every specialist agent in Khala shares the same gateway, the same artifact cache, the same observability plane. Many teams. One mind. One objective. Yours.

§ 01 · What it is

A shared mind, wired from specialists.

Khala is an open-source multi-agent AI orchestration platform. It bundles twenty specialist agent teams — spanning autonomous software development, product planning, market research, content, compliance, sales, and personal assistance — under a single FastAPI gateway at /api/<team>. Every team is a team-lead agent that coordinates domain specialists over typed Pydantic contracts, with a pluggable LLM backend (Ollama, Ollama Cloud, or Claude) and opt-in Temporal durable workflows for crash-resumable execution.

But the wiring isn't the point. The point is that every team plugs into the same shared mind, so you can bring them in on whatever you're working on — turning a spec into shipped code, running a market discovery, drafting a launch, pairing on a portfolio, or decomposing a genuinely ambiguous problem with Deepthought's recursive sub-agent spawner.

01

You work with it

Khala isn't a tool you point at a problem. It's a collaborator you think alongside — from discovery through ship.

02

Many teams, one mind

Twenty specialist teams today — engineering, planning, research, content, compliance, sales, personal, more — addressable as one.

03

Grows its own roster

Designing new teams is the product. Describe one in plain English; Agentic Team Provisioning drafts the roster and the process with you.

The real project isn't the 20 teams. It's the system that makes agentic teams — and lets them operate as one mind.

§ 02 · Why it's interesting

Not another agent framework.

Six properties that make Khala worth poking at, even if you already have a favorite orchestrator.

One gateway, one mind

Every team mounts under /api/<team> behind a single FastAPI server with an optional request-scanning security pre-scan. The whole roster is addressable — and collaborates — as one surface.

Opt-in durability

Set TEMPORAL_ADDRESS and teams that export workflows switch from in-process threads to Temporal 1.24.2 durable executions that survive restarts. Don't set it and everything still works.

Bring your own LLM

Unified client for Ollama Cloud, local Ollama, or Claude. Per-role overrides where it matters (planning, architecture, specialists).

Real engineering inside SE

4-phase pipeline — Discovery → Design → Execution → Integration — with parallel backend/frontend queues, a planning cache, per-task quality gates (lint, build, review, acceptance, security, QA, DbC, a11y), and a Repair Agent for crash recovery.

Observability built in

Every FastAPI service in the Docker stack is auto-instrumented. Prometheus + a provisioned Grafana dashboard ship in docker-compose.yml — no extra setup.

Built to grow

New teams aren't a plugin afterthought — they're the product. Add one by conversation or register it in TEAM_CONFIGS; it mounts at /api/<slug> on next restart.

§ 03 · The roster

Twenty specialists. One mind.

Grouped for navigation, not for architecture — Core Dev, Business, Content, and Personal. The roster grows and prunes itself as we learn what's worth keeping.

Authoritative list: backend/unified_api/config.py · run GET /teams on a live instance for the live roster.

§ 04 · Architectural decisions

How Khala is wired — and why.

The interesting choices, not the obvious ones. Every decision below has a trade-off the project accepted on purpose.

flowchart TB user([you / client]) ui["Angular UI"] gw["Unified API\nFastAPI gateway\n(optional security pre-scan)"] subgraph teams ["mounted teams · /api/[slug]"] direction LR core["🛠 Core dev"] biz["💼 Biz"] content["✍ Content"] personal["🧘 Personal"] more["…"] end subgraph runtime ["execution runtime"] direction LR threads["threads\n(default)"] temporal["Temporal workflows\n(opt-in · TEMPORAL_ADDRESS)"] end shared[("shared · Postgres · AGENT_CACHE · LLM client")] user --> ui --> gw gw --> core gw --> biz gw --> content gw --> personal gw --> more teams --> threads teams --> temporal threads --> shared temporal --> shared classDef gw fill:#140a2e,stroke:#5ef0ff,stroke-width:1.5px,color:#ece7ff classDef team fill:#0d0622,stroke:#8b5cff,stroke-width:1px,color:#ece7ff classDef rt fill:#0d0622,stroke:#ffd36b,stroke-width:1px,color:#ece7ff classDef shared fill:#140a2e,stroke:#ff5cd4,stroke-width:1.5px,color:#ece7ff class gw gw class core,biz,content,personal,more team class threads,temporal rt class shared shared

SE team pipeline — one spec in, a deployable system out.

flowchart LR spec["initial_spec.md"] subgraph d1 ["1 · Discovery"] parse["LLM spec parse"] plan["Planning v2/v3\n(6-phase)"] parse --> plan end subgraph d2 ["2 · Design"] tl["Tech Lead"] arch["Architecture Expert"] master["master_plan.md"] tl --> master arch --> master end subgraph d3 ["3 · Execution"] direction TB prefix["Prefix queue\ngit + DevOps"] be["Backend worker\ngenerate → lint → build → review → QA"] fe["Frontend worker\ngenerate → lint → build → review → QA"] prefix --> be prefix --> fe end subgraph d4 ["4 · Integration"] integ["Integration Agent\n(contract check)"] devops["DevOps trigger"] sec["Final security pass"] docs["Docs update"] integ --> devops --> sec --> docs end spec --> d1 --> d2 --> d3 --> d4 classDef phase fill:#0d0622,stroke:#8b5cff,stroke-width:1px,color:#ece7ff classDef node fill:#140a2e,stroke:#5ef0ff,stroke-width:1px,color:#ece7ff class d1,d2,d3,d4 phase class spec,parse,plan,tl,arch,master,prefix,be,fe,integ,devops,sec,docs node

ADR-01

One unified gateway, not a service mesh

Every team mounts under /api/<slug> behind a single FastAPI app. The security pre-scan lives at the gateway (SECURITY_GATEWAY_ENABLED=true) so it's applied once, consistently, and teams don't each re-roll their own auth layer.

Trade-off we took: every team shares a process in local dev. Good for velocity, bad for strict blast-radius isolation — Docker mode gives each team its own container for that.

ADR-02

Durability is opt-in, not mandatory

If TEMPORAL_ADDRESS is set, teams that export WORKFLOWS / ACTIVITIES from <team>/temporal/__init__.py switch from threads to Temporal 1.24.2 workflows — progress survives server restarts. If it's not set, everything still runs as background threads.

Trade-off we took: two execution paths to maintain. In exchange, local dev stays zero-dependency and production gets crash-resumable pipelines.

ADR-03

Shared Postgres schema registry

Teams export a SCHEMA: TeamSchema constant (pure data, no side effects); each team's lifespan calls register_team_schemas(SCHEMA) at startup. No POSTGRES_HOST? Registration no-ops. Teams get transactional storage without a per-team migration runner.

Trade-off we took: no SQLite fallback for migrated teams — they require Postgres. Docker compose brings one up; local dev needs the env vars. We picked consistency over "runs anywhere."

ADR-04

Team-lead orchestrator + Pydantic contracts

Every team has a team-lead agent that coordinates specialists through typed request/response models. You get runtime validation, free OpenAPI docs, and a contract the agents can't silently break.

Trade-off we took: contracts are heavier to evolve than loose dict-passing. We picked the compile-time safety net over the freeform flexibility.

ADR-05

Pluggable LLM client

A single llm_service wraps Ollama (local or Cloud) and Claude behind one API, with generate_structured doing Pydantic validation + one schema-grounded self-correction retry. Per-role overrides (e.g. ARCHITECT_MODEL_SPECIALIST, BLOG_PLANNING_MODEL) let specific agents pick their own model without forking a client.

Trade-off we took: model-specific features lose fidelity behind the abstraction. We keep the escape hatch open: teams that need a raw provider call still have it.

ADR-06

Declarative agent manifests

The Agent Console reads per-agent YAML manifests from backend/agents/<team>/agent_console/manifests/*.yaml. Each describes id, team, summary, I/O schema refs, invoke metadata, and sandbox provisioning hints — and they drive the UI catalog plus the /api/agents endpoint automatically.

Trade-off we took: every new specialist gets a YAML companion. We took the authoring cost to remove the "how do I even discover this agent?" problem.

ADR-07

Warm per-team sandboxes for isolated invocation

The Agent Console Runner invokes a single specialist in a warm Docker sandbox per team (sandbox.compose.yml, dedicated sandbox-postgres, isolated network, ports 8200–8220). Sandboxes are reused across invocations and reaped after SANDBOX_IDLE_TEARDOWN_MINUTES (default 15).

Trade-off we took: cold-start latency on first invocation per team. In exchange, agents tagged requires-live-integration stay catalogued but clearly marked unrunnable — no silent breakage.

ADR-08

Named volume, team-namespaced paths

Every containerized team mounts the same agents_data named volume at /data/agents and sets AGENT_CACHE=/data/agents. Teams self-namespace under {team_name}/. Job state, caches, profiles, workspaces — all persist across restarts without per-team volume plumbing.

Trade-off we took: teams share a filesystem root. We trust namespacing over isolation at this layer; the stronger isolation story lives at the Temporal/Postgres layer.

Full diagrams (SDLC phases, task graphs, worker pipelines, DevOps gates): ARCHITECTURE.md

§ 05 · What's shipping now

The last few heartbeats.

Pulled from CHANGELOG.md. This is active research — shapes and surfaces move.

unreleased · added

Agent Console — Phase 2: Runner + Sandboxes

The Runner tab now invokes single specialist agents in warm per-team Docker sandboxes. Dedicated sandbox-postgres, isolated khala-sandbox network, idle reaping after 15 min. Four teams wired day one (blogging, software_engineering, planning_v3, branding); others join as their APIs mount the one-line invoke shim.
unreleased · added

Agent Console — Phase 1: Catalog

New /agent-console page replaces the old provisioning form. Browsable, searchable catalog of every specialist, team/tag filters, detail drawer — backed by declarative YAML manifests. /agent-provisioning redirects here; provisioning and environments live verbatim in the third tab.
unreleased · added

Structured-output contract in llm_service

New generate_text / generate_structured entrypoints layer Pydantic validation + one schema-grounded self-correction retry on top of complete_json. A CI static check blocks Markdown-body prompts from JSON-only methods.
unreleased · breaking

Blogging pipeline simplified

BlogReviewAgent removed. The pipeline is now research → planning → writer with a persisted ContentPlan. Planning failure returns HTTP 422 with a specific reason instead of a muddled success.

§ 05.5 · Frequently asked

Questions people actually ask.

Short answers to the things that come up in every first conversation about Khala.

What is Khala?

Khala is an open-source multi-agent AI orchestration platform. It mounts twenty specialist agent teams — each a team-lead agent coordinating role-separated specialists over typed Pydantic contracts — behind a single FastAPI gateway. Think of it as a runtime for agentic teams: you don't point it at a problem; you work with it the way you'd work with a real cross-functional team.

Is Khala production-ready?

No — Khala is experimental, active research. Outputs can be incomplete, inconsistent, or wrong. APIs change without notice. Run it in isolated environments, keep humans in the loop on anything that matters, and treat every generated artifact (code, audits, trades, compliance reports) as a draft that needs review. If you're looking for a hardened platform with SLAs, this isn't it yet.

What LLMs does Khala support?

Out of the box: Ollama (local inference or Ollama Cloud), and Claude via direct API calls. Configure with LLM_PROVIDER, LLM_BASE_URL, and LLM_MODEL. Individual teams can override per role (e.g. ARCHITECT_MODEL_SPECIALIST, BLOG_PLANNING_MODEL) so planning, architecture, and domain specialists can each pick their own model without forking the client.

How is Khala different from LangChain, CrewAI, or AutoGen?

Most agent frameworks give you primitives — tools, memories, chains — and leave team design to you. Khala ships a complete roster of opinionated specialist teams (software engineering, planning, research, blogging, compliance, and more) with production-style wiring: a unified HTTP gateway, Pydantic contracts, a shared Postgres schema registry, Temporal-backed durability, a warm Docker sandbox runner, and Prometheus + Grafana observability out of the box. And the meta-layer — designing new teams by conversation — is a first-class feature, not a plugin.

Can I run Khala without Docker?

Yes. Local dev runs the Unified API as a single FastAPI process with every enabled team's router mounted in-process on port 8080, and agents run as Python threads. You still need Postgres for the migrated teams (blogging, branding, startup_advisor, user_agent_founder, agentic_team_provisioning, nutrition, team_assistant, and unified_api credentials) — the easiest way is to start just Postgres from the Docker compose file and export the POSTGRES_* env vars.

How do I add a new agent team?

Two ways. Conversationally: describe the roster you want in plain English to the Agentic Team Provisioning team — it drafts agents, roles, and process, validates staffing, and can bridge to Agent Provisioning to stand up the environment. By hand: follow AGENT_ANATOMY.md (I/O, tools, memory, prompts, guardrails, sub-agents), register the team in backend/unified_api/config.py (TEAM_CONFIGS), and it mounts at /api/<your-slug> on next restart.

What does "Khala" mean?

Named after the Protoss unifying religion from StarCraft — a psionic link joining many minds into one. The metaphor is deliberate: every specialist agent in Khala shares the same gateway, the same artifact cache, the same observability plane. Many teams. One mind. One objective. Yours.

⚠ being honest about where this is

Khala is experimental. The agents here are active research, not a production product. Outputs can be incomplete, inconsistent, or just plain wrong. APIs change without notice. A team that shipped a feature yesterday may hit a wall today. Run it in isolated environments, keep humans in the loop on anything that matters, and treat every generated artifact as a draft that needs review.

Looking for a hardened platform with SLAs? This isn't it — yet. Looking to build, tinker, and push the frontier of multi-agent systems? Welcome aboard.

§ 06 · Get it running

Two ways in.

01

🚢 Docker — full stack

Brings up Postgres, Temporal + UI, a per-team microservice for every enabled team, the Unified API proxy, the Angular UI, Prometheus, and Grafana.

cp docker/.env.example docker/.env   # set OLLAMA_API_KEY
./docker/ensure-network.sh           # one-time
docker compose -f docker/docker-compose.yml \
  --env-file docker/.env up --build

UI localhost:4201
API localhost:8888/docs
Temporal localhost:8080
Grafana localhost:3000

02

🧑‍💻 Local — hack on the code

The Unified API as a single FastAPI process with every team's router mounted in-process. Postgres required for migrated teams — start one from Docker.

# terminal 1 — backend
cd backend
make install
python run_unified_api.py
# → http://localhost:8080/docs

# terminal 2 — frontend
cd user-interface
nvm use && npm ci && npm start
# → http://localhost:4200

make lint ruff check + format
make test per-team pytest suites
make run reload-enabled dev server

Add your own team

A

Conversationally

Describe the roster you want in plain English. Agentic Team Provisioning drafts the agents, roles, and process, validates staffing, and (optionally) bridges to Agent Provisioning to stand up the environment.

B

By hand

Follow AGENT_ANATOMY.md (I/O, tools, memory, prompts, guardrails, sub-agents), register in TEAM_CONFIGS, and it mounts at /api/<your-slug> on next restart.

You don't need a team. You need a Khala.