Modern Knowledgebases

A few weeks ago, I migrated a dataset of around 30 million rows from MySQL to Parquet + DuckDB.
The query time dropped from nearly a minute to just a few seconds.
That shift changed how I thought about data: sometimes, the biggest gains come not from more compute,
but from how data is organized and read.

That realization made me curious — if columnar databases redefined analytics,
what’s the equivalent shift happening in knowledge systems?

Over the past year, I’ve been exploring how new systems—vector databases, embedding models, and retrieval frameworks—are quietly rebuilding the foundation of how machines represent and reason over information.
This post is my attempt to organize that understanding.


1. The Architecture of Modern Knowledgebases

At a high level, every modern “AI knowledgebase” sits on a stack of layers.
Each one has a distinct responsibility—from raw storage to semantic reasoning.

Layer Responsibility Focus Area Analogy
L0 – Physical Storage Where the bytes actually live: disk, SSD, S3, or block storage. Durability, throughput, cost. The warehouse floor — where everything physically sits.
L1 – Data Layout How vectors and metadata are serialized or chunked on disk. Data formats, compression, compaction. The shelving system — how boxes are arranged.
L2 – Indexing & Retrieval How we find similar vectors quickly, without scanning everything. ANN algorithms like HNSW, IVF, PQ, DiskANN. The map of aisles — guiding you to the right shelf.
L3 – Search & API Layer The database interface: how you insert, query, and filter. Schema design, access control, hybrid filters. The reception desk — turns requests into lookups.
L4 – Integration / Retrieval Orchestration Coordinates ingestion, embeddings, hybrid search, and reranking. Connectors, embedding pipelines, rerankers, query rewriting. The librarian — knows where to look and stacks the right boxes for you.
L5 – Reasoning / Generation The layer that actually “thinks.” Uses context from L4 and responds in language. Prompting, planning, grounding, LLM reasoning. The subject-matter expert — reads the boxes and explains.

Notes:

  1. Vector stores (L0–L3) handle how knowledge is stored, indexed, and retrieved efficiently.
  2. Integration layers (L4) orchestrate embeddings, rerankers, and retrieval pipelines.
  3. Reasoning layers (L5) use that context to generate insights, answers, or summaries.
  4. Weaviate extends into L4; Kendra is a managed retrieval system; LangChain and LlamaIndex span both retrieval and reasoning.
  5. Together, these layers define the architecture of a modern AI knowledgebase — a system that doesn’t just store information, but can understand and communicate it.

2. Vectors and Meaning

At the heart of this new architecture is the vector — a numerical representation of meaning.

For example:

"The cat sits on the mat" → [0.12, -0.45, 0.88, ...] # 1536-dimensional embedding

A vector is just a long list of floating-point numbers, but its geometry captures relationships:
sentences or images that “mean” similar things are close together in this high-dimensional space.
Different models produce different kinds of embeddings, depending on what they were trained for.

Model Training Focus Strengths
OpenAI text-embedding-3-large General-purpose text Broad coverage and strong multilingual performance.
AWS Titan Embeddings G1 Enterprise documents Handles structured, factual content effectively.
Cohere Embed v3 Multi-domain semantic search Tunable for classification and retrieval tasks.
CLIP (OpenAI) Image ↔ text alignment Bridges visual and language representations.
E5 (Microsoft) Sentence-level retrieval Optimized for semantic search and ranking.
Instructor XL Task-specific embeddings Performs well in RAG and domain-tuned workflows.

Understanding embeddings is the first step.
But storing and searching through millions of them efficiently is what brought about vector databases.


3. Vector Databases: How They Differ

Not all vector stores are built the same way.
Some focus on scalability, others on analytics or simplicity.
Each can be understood through the same layered lens used above.

System Implements Indexing (L2) Storage Layout (L1) Physical Storage (L0) Summary
Weaviate L3–L0 (+ optional L4 modules) HNSW / DiskANN Custom KV schema Disk + S3 backup A full-featured vector database with built-in hybrid search and RAG extensions.
LanceDB L3–L0 IVF_FLAT / PQ (Arrow-native) Apache Arrow / Lance Local / S3 Columnar and analytics-friendly, ideal for local or hybrid workloads.
ChromaDB L3–L0 FAISS / HNSW DuckDB / SQLite Local Lightweight and Python-first — great for experimentation and rapid prototyping.
S3 Vector Bucket L3–L0 (managed) AWS-managed ANN Proprietary format S3 Serverless and fully managed; indexing and scaling handled by AWS.
OpenSearch (KNN Plugin) L3–L0 (Lucene-based) HNSW / IVF / PQ Lucene segments Disk / EBS Text-first search engine with added vector retrieval capabilities.

Common indexing methods:


4. From Storage to Understanding

Once the data is stored and indexed, the next challenge is orchestration — how to retrieve and reason over it.

System L0 L1 L2 L3 L4 L5 Notes
Weaviate ✅ (auto-embed, hybrid, rerank, “generative” plugins) Has optional L4 modules but relies on external LLMs for reasoning.
Pinecone Pure vector database; bring your own orchestration and reasoning layers.
LanceDB Minimal Focused on analytics; orchestration handled externally.
ChromaDB Light Great for quick RAG prototypes via LangChain or LlamaIndex.
OpenSearch (KNN) Hybrid keyword + vector Adds ANN to a text-based search engine.
Amazon Kendra Managed Managed Managed Managed ❌* A managed retrieval system; for generation, pair with Bedrock or another LLM.
LangChain Framework that orchestrates retrieval (L4) and reasoning (L5) across data sources.
LlamaIndex Similar to LangChain; adds graph-based indexing and composability.
Bedrock / OpenAI / Claude / Gemini Pure reasoning layer — uses retrieved context to generate answers.

“Managed” means the internal indexing and storage details are not visible to the user.


5. E2E Flow

Ingestion Pipeline

KB Ingestion Pipeline
🖼️ KB - Ingestion Pipeline Example

Query the knowledgebase (retrieval orchestration + LLM)

In this sequence, the LLM (L5) is drawn directly beside the User because it’s the layer the user interacts with — the conversational or reasoning interface.

Conceptually, however, L5 depends on L4: the orchestrator (L4) handles retrieval, embeddings, reranking, filtering, and context assembly before the LLM can reason over it.

In other words, L4 prepares the knowledge, and L5 expresses it. The Vector Store (L0–L3) remains purely a retrieval substrate — it stores, indexes, and returns vectors, but performs no reasoning or synthesis.

KB Query Pipeline
🖼️ KB - Query Pipeline Example
Contents