GLOSSARY // ORGANIZED AI
Wiki
A–Z definitions for every term used across the guide and arch pages. Each entry has a short definition, a concrete example, and links to where the term appears in context. Skim it or use it as a lookup.
21 terms
3 sections
cross-linked
no jargon-as-decoration
How to read this glossary
Every entry has the same three-part shape: definition (what the thing is, in plain English), example (a concrete instance you can point at), and see-also (where it appears in the other sites or related terms here).
Color in the term name signals primary domain: Yellow = core retrieval concept, applies everywhere.
Terms A–E
Agent
A software actor that can take actions on your behalf — make API calls, write to systems, decide what to do next — usually wrapped around an LLM with tools, memory, and guardrails.
ex: Orectic's "Oracle" is an agent. Penumbra doesn't ship one — you build your own on top of their substrate.
Chunk
A small slice of a document — a paragraph, a few sentences, sometimes a fixed token window — that gets embedded into a vector store as one unit of retrieval.
ex: a 500-token slice of an MSA contract, indexed alongside thousands of other slices, retrievable by similarity.
Domain model
A description of how a particular business actually works — its objects, the rules between them, the workflows that move them, the standards that judge them. The thing Penumbra has you write.
ex: for a private-equity firm: Deal, Memo, InvestmentCommittee, DiligenceFinding as objects; "every memo cites at least two findings" as a rule.
Embedding
A vector of numbers — typically 384, 768, or 1536 floats — that represents the meaning of a piece of text. Similar meanings land close in vector space. The atomic unit of semantic search.
ex: the sentence "the dog barked" might embed to [0.12, -0.43, 0.91, ...]; "the puppy yelped" lands very close to it; "the car started" lands far away.
Entity
A specific, named thing in your business — Customer Acme, Renewal Q3-2025, Decision DISC-2156. Knowledge graphs are made of entities (nodes) and the typed relationships between them (edges).
ex: Customer:Acme is one entity. Its identity is fixed across all the documents that mention it — that resolution is called entity linking.
The process of pulling structured data — entities, relationships, dates, dollar amounts — out of unstructured text, audio, or video. The first half of Orectic's product.
ex: from a sales call transcript, extract Customer = Acme, Topic = renewal, NextStep = approval needed by Marta.
Terms F–O
GraphRAG
Hybrid retrieval that combines a vector store and a knowledge graph. The graph provides typed precision (the exact entity); the vector store provides recall (related unstructured chunks). The LLM gets both.
ex: "Why did we discount Acme?" → graph hops to Decision DISC-2156 (precise); vector finds the Slack thread debating it (texture); LLM synthesizes both.
Guardrails
Rules and constraints that limit what an agent is allowed to do, see, or say. Often implemented as input/output filters, permission checks, or content classifiers.
ex: "this agent may read Customer records but never write to them" or "this response must cite at least one source from the graph."
Hybrid retrieval
Any retrieval strategy that combines two or more methods — most commonly vector + graph, but also vector + keyword (BM25), or vector + structured SQL. GraphRAG is one popular instance.
ex: pure vector returns 50 candidate chunks; a graph traversal narrows to the 12 connected to the canonical entity; the LLM gets the intersection.
Knowledge graph
A database of typed entities (nodes) and typed relationships (edges) between them. Lets you do multi-hop traversal — "find all decisions made by approvers in the Sales org affecting customers in EMEA last quarter."
ex: (Customer:Acme) -[placed]-> (Renewal:Q3-2025) -[decided]-> (Decision:DISC-2156).
LLM
Large language model. The thing that takes a prompt (retrieved chunks + question) and writes the final answer. In all three retrieval approaches, the LLM is the synthesizer — the differences are what gets handed to it.
ex: Claude Sonnet 4.6, GPT-4o, Gemini 2.5 Pro. The retrieval method determines whether the model is doing reasoning or just paraphrasing.
Memory
Structured state an agent leaves behind so it (or another agent) can pick up where it stopped. Different from chat history — memory is typed and queryable, not just a transcript to scroll.
ex: after a triage agent processes a ticket, it writes {ticket: T-447, classification: refund, confidence: 0.92} to memory; the next agent reads that directly.
Ontology
The schema of a knowledge graph — what entity types exist, what relationship types are valid between them, what attributes each entity carries. The thing Penumbra has you author. The thing Orectic infers from data.
ex: "a Decision always has an approver (Person) and a reason (string) and may have one or more citations (Document)."
Terms P–Z
Provenance
A traceable record of where an answer came from — which source documents, which graph entities, which prior decisions shaped it. Without provenance, an agent's output is unverifiable.
ex: the answer "Marta approved 15% off on 2025-08-14" cites DISC-2156 (typed record) and slack#acme-deal (chunk) as sources.
RAG
Retrieval-augmented generation. The standard pattern: embed your documents into a vector store; at query time, find the nearest chunks; stuff them into the LLM's prompt; let the LLM answer with that context.
ex: every "chat with your PDF" app you've used since 2023 is RAG. Cheap, common, hits a ceiling on multi-hop or precise-citation queries.
Retrieval
The act of pulling relevant context out of a store (vector, graph, keyword, SQL) before handing it to an LLM. The "R" in RAG. Mostly invisible to end users but determines whether the answer is right.
ex: when you ask ChatGPT about a PDF you uploaded, the retrieval step decides which slices of the PDF make it into the prompt.
Schema
The shape of your structured data — the tables/types/fields/relations you allow. For graphs, schema is the ontology. For Orectic, schema is inferred. For Penumbra, schema is authored.
ex: "a Renewal has start_date (date), customer (Customer), amount (money), decision (Decision)."
Semantic search
The act of searching by meaning rather than keywords. Implemented by embedding the query, then finding nearest neighbors in a vector store. The thing that lets "I want a refund" match "cancel my order."
ex: query "puppy" retrieves chunks containing "dog," "canine," "lab mix," none of which share the literal word.
Tools
Typed functions an agent can call — usually with structured input and output. Different from "any code the agent can write." A tool is a contract: a name, a JSON schema for arguments, a documented behavior.
ex: create_invoice(customer_id, amount, due_date) exposed to an agent so it can act on the AR system without hallucinating the API.
Vector
A list of floating-point numbers. In AI retrieval, it's the embedding of a chunk or query. The dimensionality (length) is fixed per embedding model.
ex: OpenAI's text-embedding-3-small produces 1536-dim vectors. Cohere's embed-v3 produces 1024.
Vector store
A database optimized for storing and similarity-searching vectors. Inputs go in as (id, vector, optional metadata). Queries return the top-k nearest neighbors by cosine or dot-product distance.
ex: Pinecone, Weaviate, Qdrant, Chroma, pgvector. All do roughly the same thing with different operational trade-offs.
Stack & conventions
Same single-file pattern as the rest of the hub. Glossary entries use a custom .term block — left-bordered, monospace term name in yellow, plain English definition, italicized example, see-also links in monospace.
Organized AI
Cloudflare Pages
wrangler 4.55
single-file HTML