- Home
- Blog
- AI & GenAI
- Open Knowledge Format (OKF): How Google's New Standard Feeds Curated Context to AI Agents
Open Knowledge Format (OKF): How Google's New Standard Feeds Curated Context to AI Agents

Google Cloud's new Open Knowledge Format (OKF) packages organisational knowledge as plain Markdown that AI agents can read and maintain. A clear guide to what OKF is, how bundles work, OKF vs RAG, and why it matters for AI engineers in 2026.
On June 2026, Google Cloud's Data Cloud team published the Open Knowledge Format (OKF) — an open, vendor-neutral specification for packaging organisational knowledge as plain Markdown that both humans and AI agents can read, write, and maintain. If you are learning Generative AI or Agentic AI in 2026, OKF is one of those quiet releases that tells you where the industry is heading: away from clever prompts and toward well-curated context.
The headline pattern is simple: OKF is "just Markdown, just files, just YAML frontmatter." No SDK, no proprietary database, no runtime. That minimalism is exactly why it matters — and why it is worth understanding early in your AI career.
The problem OKF solves: fragmented context
Modern LLMs are no longer limited by reasoning ability so much as by what they know about your world. The useful knowledge an AI agent needs — table schemas, metric definitions, API deprecation notes, incident runbooks, join paths between datasets — is almost always scattered across incompatible silos:
- Metadata catalogs, each with its own API and SDK
- Wikis (Confluence, Notion) that go stale
- Shared drives and spreadsheets
- Code comments and docstrings
- Tribal knowledge that lives only in senior engineers' heads
Every team building an agent ends up solving the same "context-assembly" problem from scratch, and every vendor locks that knowledge behind its own format. OKF's bet is that the format itself — not another platform — is the missing piece.
There is a neat way to see why a shared format helps. Without one, connecting A agents to S knowledge sources is an A × S integration problem — every agent needs a custom connector for every source. A shared format collapses that to A + S: each agent learns to read OKF once, each source learns to emit OKF once, and they interoperate for free.
Without a shared format: With OKF:
A agents × S sources A agents + S sources
= A · S custom connectors = A + S adapters
3 agents × 4 sources = 12 3 + 4 = 7
6 agents × 6 sources = 36 6 + 6 = 12
That A · S → A + S collapse is the same economic argument behind every successful standard, from USB to HTTP.
What exactly is OKF?
OKF formalises a pattern that AI researcher Andrej Karpathy called the "LLM wiki" — the observation that language models are unusually good at the chores humans abandon in personal wikis. As Karpathy put it, LLMs "don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass." OKF turns that idea into a portable specification.
An OKF bundle is just a directory of Markdown files. Each file describes one concept — a table, a dataset, a metric, a playbook, a runbook, or an API. The file path is the concept's unique identifier, and Markdown links between files form a knowledge graph.
The structure of an OKF bundle
sales/
├── index.md # entry point (progressive disclosure)
├── datasets/
│ ├── index.md
│ └── orders_db.md
├── tables/
│ ├── index.md
│ ├── orders.md
│ └── customers.md
└── metrics/
├── index.md
└── weekly_active_users.md
Two optional files give the bundle extra structure: an index.md for progressive disclosure (a curated entry point so an agent doesn't have to read everything at once) and a log.md for chronological change history.
What a single concept looks like
Every concept document is YAML frontmatter + a Markdown body. The only mandatory field is type; everything else is reserved-but-optional (title, description, resource, tags, timestamp):
---
type: BigQuery Table
title: Orders
description: One row per completed customer order.
resource: https://console.cloud.google.com/bigquery?p=acme&d=sales&t=orders
tags: [sales, revenue]
timestamp: 2026-05-28T14:30:00Z
---
# Schema
| Column | Type | Description |
|---------------|--------|--------------------------------------|
| `order_id` | STRING | Globally unique order identifier. |
| `customer_id` | STRING | FK to [customers](/tables/customers.md). |
# Joins
Joined with [customers](/tables/customers.md) on `customer_id`.
Notice the [customers](/tables/customers.md) link inside the table. That single Markdown link is what turns a folder of files into a graph an agent can traverse.
How AI agents consume OKF
Because a bundle is "just files," an agent needs no SDK to read it — pathlib, a regex, and a YAML parser are enough to load every concept and rebuild the link graph:
import pathlib, re, yaml
def load_bundle(root: str) -> dict:
concepts = {}
for path in pathlib.Path(root).rglob("*.md"):
text = path.read_text(encoding="utf-8")
# split YAML frontmatter from the markdown body
fm = re.match(r"^---\n(.*?)\n---\n(.*)$", text, re.DOTALL)
meta = yaml.safe_load(fm.group(1)) if fm else {}
body = fm.group(2) if fm else text
links = re.findall(r"\]\(([^)]+\.md)\)", body) # outgoing edges
concepts[str(path)] = {"meta": meta, "body": body, "links": links}
return concepts
Once loaded, the bundle is a graph the agent walks on demand — following only the edges relevant to the task instead of stuffing every document into the prompt:
┌─────────────┐
│ index.md │
└──────┬──────┘
┌──────┴───────┐
▼ ▼
┌────────────┐ ┌──────────────┐
│ orders.md │─►│ customers.md │
└─────┬──────┘ └──────────────┘
▼
┌────────────────────┐
│ weekly_active_users│ (metric → cites orders.md)
└────────────────────┘
Crucially, agents don't just read OKF — they can maintain it. An on-call agent can update a runbook after an incident; a data agent can refresh a schema when a column is added. The team curates the content and reviews it like code, while the agents do the drudgery of keeping cross-references current.
OKF vs RAG vs proprietary catalogs
A common question from our GenAI learners: "Isn't this just RAG?" Not quite. Here is how OKF compares to the approaches you already know:
| Approach | What it stores | Curated? | Version-controlled? | Needs an SDK / platform? |
|---|---|---|---|---|
| OKF | Hand-curated concepts as Markdown | Yes | Yes (lives in git) | No |
| RAG | Raw document chunks, embedded at query | No | Rarely | Vector DB + pipeline |
| Proprietary catalog | Metadata in a vendor's schema | Yes | Vendor-dependent | Yes (account + API) |
| Obsidian / Hugo | Markdown notes / site content | Yes | Yes | No, but no agent contract |
The key distinction: RAG re-derives knowledge from raw chunks at query time, while OKF stores deliberately curated, reviewed concepts. The two are complementary — many teams will use OKF as the trustworthy, human-approved layer and RAG for everything else.
OKF is also not a competitor to Anthropic's Model Context Protocol (MCP). MCP is a protocol for how an agent connects to tools and resources at runtime; OKF is a format for how knowledge is written down and shared. You can serve an OKF bundle through an MCP server — they sit at different layers of the stack.
The three design principles
OKF's specification (v0.1) fits on a single page, and three principles explain its restraint:
- Minimally opinionated. The only hard requirement is that every concept has a
type. Producers invent their own types, fields, and sections. - Producer/consumer independence. Whoever writes the knowledge and whoever reads it are decoupled — a hand-authored bundle, a BigQuery export, and one LLM's notes read by another all work identically.
- A format, not a platform. It is tied to no cloud, database, or model provider. The value comes from broad adoption, not ownership — so it was published as an open standard from day one.
What Google shipped alongside the spec
To make OKF real rather than theoretical, Google released three reference components on the GoogleCloudPlatform/knowledge-catalog GitHub repo:
- An Enrichment Agent that walks BigQuery datasets and drafts OKF concept docs (schemas, join paths, citations).
- A static HTML visualiser that turns any bundle into an interactive graph view in a single self-contained file — no backend.
- Three sample bundles built from public datasets: GA4 e-commerce, Stack Overflow, and Bitcoin.
Why this matters for AI engineers in Pune
For freshers and working engineers in Pune's AI ecosystem, OKF is a signal worth reading. The skills that clear AI hiring panels in 2026 are shifting from "can you call an LLM API" toward "can you design the context and knowledge layer an agent depends on." Metadata-as-code, knowledge graphs, agentic workflows, and clean documentation practices are becoming core engineering skills, not afterthoughts.
Concretely, OKF-style thinking shows up in the highest-paying Pune AI tracks:
- GenAI / LLM engineering — RAG, retrieval, and now curated-context formats like OKF.
- Agentic AI — multi-agent systems that read and write their own knowledge.
- Data & ML platforms — metadata catalogs, lineage, and schema documentation.
You don't need to wait for OKF to become mainstream to benefit from the underlying habits: write knowledge as files, keep it in version control, link concepts together, and design context deliberately.
How to build these skills
At Archer Infotech, the agent-and-context skills behind OKF map directly onto two of our Pune tracks:
- Generative AI Training in Pune — LLMs, RAG, prompt and context engineering, and production deployment.
- Agentic AI Training in Pune — multi-agent workflows, tool use, and the knowledge layers that make agents reliable.
If a data-platform path fits you better, the Machine Learning and Data Science tracks cover the schema, metric, and pipeline foundations that OKF documents.
Frequently asked questions
Is OKF free and open source? Yes. The OKF specification is published as an open standard, and the reference tools live in a public Google Cloud GitHub repository under an open-source licence. There is no account, SDK, or platform requirement to use the format.
Do I need Google Cloud to use OKF? No. OKF is deliberately vendor-neutral — bundles are plain Markdown files you can host in any git repo, ship as a tarball, or mount on any filesystem. Google Cloud's Knowledge Catalog can ingest OKF, but it is one consumer among many.
Is OKF a replacement for RAG? No — they solve different problems. OKF stores curated, version-controlled knowledge; RAG retrieves from raw, unstructured content at query time. Most production systems will use both together.
What should a beginner learn first? Start with solid Generative AI fundamentals — LLMs, RAG, and prompt/context engineering — then move into Agentic AI. OKF will feel natural once you understand how agents consume context.
The takeaway
OKF is a small specification with a big idea: the bottleneck for AI agents is curated context, and the way to share it is an open, boring, durable format — Markdown files in git. For anyone building an AI career in Pune in 2026, the lesson is to treat knowledge as a first-class engineering artifact.
Want to build the GenAI and agentic-AI skills employers are hiring for? Talk to our team about the right track for your background, or explore the Generative AI and Agentic AI programs.
Pune IT careers — monthly briefing
One email a month with the most actionable Pune IT hiring + salary updates. Free.
One email per month. No spam. Unsubscribe anytime.
