Back to Blog
AI & GenAI

Open Knowledge Format (OKF): How Google's New Standard Feeds Curated Context to AI Agents

Vinod Patil, Solutions Architect & AI Trainer at Archer InfotechVinod Patil~ 8 min read
Featured image for Open Knowledge Format (OKF): How Google's New Standard Feeds Curated Context to AI Agents — AI & GenAI guide on the Archer Infotech blog, written by Archer Infotech

Google Cloud's new Open Knowledge Format (OKF) packages organisational knowledge as plain Markdown that AI agents can read and maintain. A clear guide to what OKF is, how bundles work, OKF vs RAG, and why it matters for AI engineers in 2026.

On June 2026, Google Cloud's Data Cloud team published the Open Knowledge Format (OKF) — an open, vendor-neutral specification for packaging organisational knowledge as plain Markdown that both humans and AI agents can read, write, and maintain. If you are learning Generative AI or Agentic AI in 2026, OKF is one of those quiet releases that tells you where the industry is heading: away from clever prompts and toward well-curated context.

The headline pattern is simple: OKF is "just Markdown, just files, just YAML frontmatter." No SDK, no proprietary database, no runtime. That minimalism is exactly why it matters — and why it is worth understanding early in your AI career.

The problem OKF solves: fragmented context

Modern LLMs are no longer limited by reasoning ability so much as by what they know about your world. The useful knowledge an AI agent needs — table schemas, metric definitions, API deprecation notes, incident runbooks, join paths between datasets — is almost always scattered across incompatible silos:

  • Metadata catalogs, each with its own API and SDK
  • Wikis (Confluence, Notion) that go stale
  • Shared drives and spreadsheets
  • Code comments and docstrings
  • Tribal knowledge that lives only in senior engineers' heads

Every team building an agent ends up solving the same "context-assembly" problem from scratch, and every vendor locks that knowledge behind its own format. OKF's bet is that the format itself — not another platform — is the missing piece.

There is a neat way to see why a shared format helps. Without one, connecting A agents to S knowledge sources is an A × S integration problem — every agent needs a custom connector for every source. A shared format collapses that to A + S: each agent learns to read OKF once, each source learns to emit OKF once, and they interoperate for free.

Without a shared format:        With OKF:
  A agents × S sources            A agents + S sources
  = A · S custom connectors       = A + S adapters

  3 agents × 4 sources = 12       3 + 4 = 7
  6 agents × 6 sources = 36       6 + 6 = 12

That A · S → A + S collapse is the same economic argument behind every successful standard, from USB to HTTP.

What exactly is OKF?

OKF formalises a pattern that AI researcher Andrej Karpathy called the "LLM wiki" — the observation that language models are unusually good at the chores humans abandon in personal wikis. As Karpathy put it, LLMs "don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass." OKF turns that idea into a portable specification.

An OKF bundle is just a directory of Markdown files. Each file describes one concept — a table, a dataset, a metric, a playbook, a runbook, or an API. The file path is the concept's unique identifier, and Markdown links between files form a knowledge graph.

The structure of an OKF bundle

sales/
├── index.md                  # entry point (progressive disclosure)
├── datasets/
│   ├── index.md
│   └── orders_db.md
├── tables/
│   ├── index.md
│   ├── orders.md
│   └── customers.md
└── metrics/
    ├── index.md
    └── weekly_active_users.md

Two optional files give the bundle extra structure: an index.md for progressive disclosure (a curated entry point so an agent doesn't have to read everything at once) and a log.md for chronological change history.

What a single concept looks like

Every concept document is YAML frontmatter + a Markdown body. The only mandatory field is type; everything else is reserved-but-optional (title, description, resource, tags, timestamp):

---
type: BigQuery Table
title: Orders
description: One row per completed customer order.
resource: https://console.cloud.google.com/bigquery?p=acme&d=sales&t=orders
tags: [sales, revenue]
timestamp: 2026-05-28T14:30:00Z
---

# Schema
| Column        | Type   | Description                          |
|---------------|--------|--------------------------------------|
| `order_id`    | STRING | Globally unique order identifier.    |
| `customer_id` | STRING | FK to [customers](/tables/customers.md). |

# Joins
Joined with [customers](/tables/customers.md) on `customer_id`.

Notice the [customers](/tables/customers.md) link inside the table. That single Markdown link is what turns a folder of files into a graph an agent can traverse.

How AI agents consume OKF

Because a bundle is "just files," an agent needs no SDK to read it — pathlib, a regex, and a YAML parser are enough to load every concept and rebuild the link graph:

import pathlib, re, yaml

def load_bundle(root: str) -> dict:
    concepts = {}
    for path in pathlib.Path(root).rglob("*.md"):
        text = path.read_text(encoding="utf-8")
        # split YAML frontmatter from the markdown body
        fm = re.match(r"^---\n(.*?)\n---\n(.*)$", text, re.DOTALL)
        meta = yaml.safe_load(fm.group(1)) if fm else {}
        body = fm.group(2) if fm else text
        links = re.findall(r"\]\(([^)]+\.md)\)", body)  # outgoing edges
        concepts[str(path)] = {"meta": meta, "body": body, "links": links}
    return concepts

Once loaded, the bundle is a graph the agent walks on demand — following only the edges relevant to the task instead of stuffing every document into the prompt:

        ┌─────────────┐
        │  index.md   │
        └──────┬──────┘
        ┌──────┴───────┐
        ▼              ▼
 ┌────────────┐  ┌──────────────┐
 │ orders.md  │─►│ customers.md │
 └─────┬──────┘  └──────────────┘

 ┌────────────────────┐
 │ weekly_active_users│  (metric → cites orders.md)
 └────────────────────┘

Crucially, agents don't just read OKF — they can maintain it. An on-call agent can update a runbook after an incident; a data agent can refresh a schema when a column is added. The team curates the content and reviews it like code, while the agents do the drudgery of keeping cross-references current.

OKF vs RAG vs proprietary catalogs

A common question from our GenAI learners: "Isn't this just RAG?" Not quite. Here is how OKF compares to the approaches you already know:

Approach What it stores Curated? Version-controlled? Needs an SDK / platform?
OKF Hand-curated concepts as Markdown Yes Yes (lives in git) No
RAG Raw document chunks, embedded at query No Rarely Vector DB + pipeline
Proprietary catalog Metadata in a vendor's schema Yes Vendor-dependent Yes (account + API)
Obsidian / Hugo Markdown notes / site content Yes Yes No, but no agent contract

The key distinction: RAG re-derives knowledge from raw chunks at query time, while OKF stores deliberately curated, reviewed concepts. The two are complementary — many teams will use OKF as the trustworthy, human-approved layer and RAG for everything else.

OKF is also not a competitor to Anthropic's Model Context Protocol (MCP). MCP is a protocol for how an agent connects to tools and resources at runtime; OKF is a format for how knowledge is written down and shared. You can serve an OKF bundle through an MCP server — they sit at different layers of the stack.

The three design principles

OKF's specification (v0.1) fits on a single page, and three principles explain its restraint:

  1. Minimally opinionated. The only hard requirement is that every concept has a type. Producers invent their own types, fields, and sections.
  2. Producer/consumer independence. Whoever writes the knowledge and whoever reads it are decoupled — a hand-authored bundle, a BigQuery export, and one LLM's notes read by another all work identically.
  3. A format, not a platform. It is tied to no cloud, database, or model provider. The value comes from broad adoption, not ownership — so it was published as an open standard from day one.

What Google shipped alongside the spec

To make OKF real rather than theoretical, Google released three reference components on the GoogleCloudPlatform/knowledge-catalog GitHub repo:

  • An Enrichment Agent that walks BigQuery datasets and drafts OKF concept docs (schemas, join paths, citations).
  • A static HTML visualiser that turns any bundle into an interactive graph view in a single self-contained file — no backend.
  • Three sample bundles built from public datasets: GA4 e-commerce, Stack Overflow, and Bitcoin.

Why this matters for AI engineers in Pune

For freshers and working engineers in Pune's AI ecosystem, OKF is a signal worth reading. The skills that clear AI hiring panels in 2026 are shifting from "can you call an LLM API" toward "can you design the context and knowledge layer an agent depends on." Metadata-as-code, knowledge graphs, agentic workflows, and clean documentation practices are becoming core engineering skills, not afterthoughts.

Concretely, OKF-style thinking shows up in the highest-paying Pune AI tracks:

  • GenAI / LLM engineering — RAG, retrieval, and now curated-context formats like OKF.
  • Agentic AI — multi-agent systems that read and write their own knowledge.
  • Data & ML platforms — metadata catalogs, lineage, and schema documentation.

You don't need to wait for OKF to become mainstream to benefit from the underlying habits: write knowledge as files, keep it in version control, link concepts together, and design context deliberately.

How to build these skills

At Archer Infotech, the agent-and-context skills behind OKF map directly onto two of our Pune tracks:

If a data-platform path fits you better, the Machine Learning and Data Science tracks cover the schema, metric, and pipeline foundations that OKF documents.

Frequently asked questions

Is OKF free and open source? Yes. The OKF specification is published as an open standard, and the reference tools live in a public Google Cloud GitHub repository under an open-source licence. There is no account, SDK, or platform requirement to use the format.

Do I need Google Cloud to use OKF? No. OKF is deliberately vendor-neutral — bundles are plain Markdown files you can host in any git repo, ship as a tarball, or mount on any filesystem. Google Cloud's Knowledge Catalog can ingest OKF, but it is one consumer among many.

Is OKF a replacement for RAG? No — they solve different problems. OKF stores curated, version-controlled knowledge; RAG retrieves from raw, unstructured content at query time. Most production systems will use both together.

What should a beginner learn first? Start with solid Generative AI fundamentals — LLMs, RAG, and prompt/context engineering — then move into Agentic AI. OKF will feel natural once you understand how agents consume context.

The takeaway

OKF is a small specification with a big idea: the bottleneck for AI agents is curated context, and the way to share it is an open, boring, durable format — Markdown files in git. For anyone building an AI career in Pune in 2026, the lesson is to treat knowledge as a first-class engineering artifact.

Want to build the GenAI and agentic-AI skills employers are hiring for? Talk to our team about the right track for your background, or explore the Generative AI and Agentic AI programs.

Pune IT careers — monthly briefing

One email a month with the most actionable Pune IT hiring + salary updates. Free.

One email per month. No spam. Unsubscribe anytime.

Ready to Start Learning?

Explore our industry-leading IT courses and take the next step in your career with Archer Infotech.