Architectural Data vs Architectural Knowledge¶
Why this distinction matters¶
Enterprise architecture teams produce enormous volumes of information — application inventories, technology standards lists, capability maps, deployment diagrams, architecture decisions, integration catalogs, risk registers. Most of this lives in spreadsheets, wikis, modeling tools, and slide decks.
The industry routinely calls all of it "architecture data." But there's a fundamental difference between recording that Application X exists and understanding why it exists, what it serves, what would break if it changed, and what should replace it. The first is data. The second is knowledge.
This distinction isn't academic. It determines whether your architecture practice can answer questions or only produce reports. It determines whether AI agents can reason about your enterprise or only retrieve facts. It determines whether architecture governance is automated or manual.
Definitions¶
Architectural data¶
Architectural data is the collection of recorded facts about an enterprise's architecture. It answers what exists and what properties it has.
Examples:
- An application inventory listing 400 applications with their names, owners, lifecycle states, and technology stacks
- A CMDB containing configuration items, their relationships, and operational metadata
- A Backstage catalog with services, APIs, teams, and dependencies
- An ArchiMate model file containing elements and relationships
- A list of technology standards with approval status and sunset dates
- A cost allocation spreadsheet mapping applications to cost centers
Architectural data is factual, observable, and (in principle) verifiable against running systems. It can be wrong — the CMDB is always stale — but its intent is to record what is.
Characteristics:
- Descriptive — states what exists
- Flat or weakly structured — rows in a table, nodes in a graph, elements in a model
- Tool-extractable — much of it can be harvested from running systems (discovery tools, APM, cloud APIs)
- Perishable — decays as the estate changes
- Context-free — a fact without interpretation ("Application X uses Java 11" says nothing about whether that's a problem)
Architectural knowledge¶
Architectural knowledge is the interpreted, contextualized understanding that gives architectural data meaning. It answers why things exist, what they mean in context, what should change, and what the consequences of change would be.
Examples:
- The rationale for choosing microservices over a monolith for the payment platform
- The understanding that Business Capability "Customer Onboarding" is served by three applications, two of which are redundant, and the third is the strategic target
- The recognition that decommissioning Platform Y would break Capability Z because of an undocumented integration through Service Q
- The governance principle that all customer-facing services must have sub-200ms P99 latency, and the reasoning behind that threshold
- The pattern language that guides architects toward event-driven integration for cross-domain workflows
- The assessment that Application X's technical debt makes it unsuitable for the AI-augmented target state
Architectural knowledge is interpretive, contextual, and often tacit. It lives in architects' heads, in decision records (when they exist), in the implicit structure of capability maps, and in the unwritten rules that govern technology selection.
Characteristics:
- Interpretive — assigns meaning to facts
- Relational — derives value from connections between concepts across layers and domains
- Contextual — the same fact means different things in different organizational contexts
- Durable — principles, patterns, and rationale outlive individual systems
- Actionable — directly supports decisions ("given this knowledge, we should do X")
The two layers¶
The distinction maps to two complementary layers that every architecture practice produces — whether it recognizes them as separate or not:
graph TB
subgraph knowledge["Architectural Knowledge"]
direction TB
K1["Decisions & Rationale"]
K2["Principles & Patterns"]
K3["Assessments & Classifications"]
K4["Impact Analysis"]
K5["Target State Vision"]
end
subgraph data["Architectural Data"]
direction TB
D1["Application Inventory"]
D2["Technology Catalog"]
D3["Capability Map"]
D4["Dependency Graph"]
D5["Cost Allocation"]
end
K1 -->|"interprets"| D1
K2 -->|"constrains"| D2
K3 -->|"classifies"| D1
K4 -->|"traverses"| D4
K5 -->|"transforms"| D3
style knowledge fill:#e8f4e8,stroke:#2d7a2d
style data fill:#e8e8f4,stroke:#2d2d7a
The data layer records what exists. The knowledge layer interprets what it means and what to do about it. Neither is useful without the other — data without interpretation is an inventory; knowledge without grounding is opinion.
The DIKW lens¶
The Data–Information–Knowledge–Wisdom hierarchy provides a useful (if imperfect) frame:
| Level | Architecture example | What it adds |
|---|---|---|
| Data | "App-042 exists, runs on Java 11, deployed to AWS eu-west-1" | Raw facts |
| Information | "App-042 is a payment gateway owned by Team Payments, classified as Invest in the TIME framework, serving 3 business capabilities" | Structure and context |
| Knowledge | "App-042 is the strategic payment platform because it's the only one with PCI-DSS compliance, sub-100ms latency, and multi-currency support. The two legacy alternatives should be migrated away from within 18 months based on the 2025 portfolio review." | Interpretation, rationale, and actionability |
| Wisdom | "Payment platforms should be consolidated early in any acquisition integration because payment is a capability where duplication creates regulatory risk, not just cost" | Generalized principles derived from experience |
Most EA tools operate at the Data and Information levels. They store elements, relationships, and properties. The Knowledge and Wisdom levels typically remain in people's heads or in unstructured documents.
Where the gap shows up in practice¶
The "ask the architect" problem¶
When someone asks "can we decommission Platform Y?" — the data says Platform Y exists and has certain properties. The knowledge required to answer the question includes:
flowchart LR
Q["Can we decommission<br/>Platform Y?"]
subgraph knowledge_needed["Knowledge Required"]
direction TB
A["Capability dependencies<br/><small>What breaks?</small>"]
B["Alternative providers<br/><small>Who else serves this?</small>"]
C["Migration cost<br/><small>What's the effort?</small>"]
D["Regulatory constraints<br/><small>Are we allowed?</small>"]
E["Risk profile<br/><small>What could go wrong?</small>"]
end
subgraph data_available["Data Available"]
direction TB
F["Platform Y exists"]
G["Uses Java 11"]
H["Owned by Team X"]
I["3 integrations"]
end
Q --> knowledge_needed
data_available -.->|"insufficient"| Q
style knowledge_needed fill:#e8f4e8,stroke:#2d7a2d
style data_available fill:#e8e8f4,stroke:#2d2d7a
An architect who has worked with Platform Y for five years can answer this in a meeting. But that answer isn't recorded, isn't queryable, isn't available at 2am when an incident responder needs it, and leaves the organization when the architect does.
The AI grounding problem¶
AI agents can retrieve data. They can look up "what applications exist in domain X" if you give them access to an inventory. What they can't do without architectural knowledge is reason about consequences, evaluate compliance, or recommend actions:
flowchart LR
subgraph with_data["AI + Data Only"]
direction TB
Q1["What apps exist?"] --> A1["List of 400 apps"]
Q2["Who owns App X?"] --> A2["Team Payments"]
Q3["What tech does it use?"] --> A3["Java 11, AWS"]
end
subgraph with_knowledge["AI + Knowledge Graph"]
direction TB
Q4["Should we decommission X?"] --> A4["No — single provider<br/>for 3 critical capabilities"]
Q5["Does this design comply?"] --> A5["Violates Principle P-007:<br/>cross-domain must be async"]
Q6["What's the migration path?"] --> A6["Option B preferred —<br/>lower risk, 6mo timeline"]
end
style with_data fill:#e8e8f4,stroke:#2d2d7a
style with_knowledge fill:#e8f4e8,stroke:#2d7a2d
This is why Forrester's context graph analysis identifies two layers that must converge: the entity graph (data) and the decision trace (knowledge). An AI agent with only the entity graph is a search engine. An AI agent with both is an architecture advisor.
The governance automation problem¶
Governance based on architectural data alone can check: "does every application have an owner?" (property completeness) or "is this relationship type valid in ArchiMate?" (metamodel conformance).
Governance based on architectural knowledge can check: "does this proposed design align with our integration principles?" or "is this technology selection consistent with our target state?" or "does this migration plan account for the dependencies identified in the last portfolio review?"
The first is SHACL validation against structural rules — which Linked.Archi already supports via validate.sh in CI/CD. The second requires encoded principles, patterns, and decision rationale — knowledge, not just data.
What transforms data into knowledge¶
The gap between architectural data and architectural knowledge isn't bridged by collecting more data. It's bridged by adding specific kinds of semantic structure:
flowchart TD
subgraph transforms["Five Transformations"]
T1["1. Typed Relationships<br/><small>Generic links → semantic predicates</small>"]
T2["2. Decision Rationale<br/><small>Facts → justified choices</small>"]
T3["3. Classification + Evidence<br/><small>Labels → assessed positions</small>"]
T4["4. Patterns & Principles<br/><small>Instances → generalized guidance</small>"]
T5["5. Impact Semantics<br/><small>Dependencies → consequence models</small>"]
end
DATA["Architectural<br/>Data"] --> T1 & T2 & T3 & T4 & T5
T1 & T2 & T3 & T4 & T5 --> KNOWLEDGE["Architectural<br/>Knowledge"]
style DATA fill:#e8e8f4,stroke:#2d2d7a
style KNOWLEDGE fill:#e8f4e8,stroke:#2d7a2d
style transforms fill:#f9f9f9,stroke:#999
1. Typed relationships with semantics¶
Data: "App-042 is connected to Capability-017"
Knowledge: "App-042 realizes Business Service 'Payment Processing' which realizes Business Capability 'Customer Payments' — meaning App-042 is a direct provider of this capability, and removing it would eliminate the capability unless an alternative realization path exists."
The difference is typed, semantically meaningful relationships that support inference. In RDF/OWL terms: am:realizes has formal semantics that a reasoner can traverse, while "is connected to" does not.
2. Decision rationale¶
Data: "The payment platform uses event-driven architecture"
Knowledge: "The payment platform uses event-driven architecture because the 2025 architecture review identified that synchronous integration between payment and fulfillment created cascading failures during peak load. The decision was evaluated against three alternatives (synchronous REST, shared database, message queue) and event-driven was selected for resilience and independent scalability. The tradeoff accepted was eventual consistency in order status reporting."
The decision record transforms a fact into actionable knowledge. Future architects know not just what was chosen but why, what was rejected, and what constraints apply.
3. Classification and assessment¶
Data: "App-042 exists"
Knowledge: "App-042 is classified as Invest in the TIME framework because it's the strategic target for payment processing, has the lowest technical debt score in its domain, and is the only platform with the architectural characteristics required for the AI-enabled target state."
Classification without rationale is data. Classification with rationale, evidence, and connection to strategy is knowledge.
4. Patterns and principles¶
Data: "Services A, B, and C all use event-driven integration"
Knowledge: "Cross-domain integration should use event-driven patterns because our architecture principles require loose coupling between bounded contexts. This principle exists because the 2024 incident analysis showed that 73% of cascading failures originated from synchronous cross-domain calls. Exceptions require an architecture decision record with explicit risk acceptance."
Patterns and principles encode generalized architectural knowledge — the "wisdom" layer that guides future decisions.
5. Impact and dependency semantics¶
Data: "App-042 depends on Service-X"
Knowledge: "App-042 has a critical runtime dependency on Service-X for payment authorization. If Service-X is unavailable for more than 30 seconds, App-042 enters degraded mode and queues transactions. If unavailable for more than 5 minutes, payment processing stops entirely. This dependency cannot be removed without re-architecting the authorization flow, estimated at 6 months effort."
The dependency fact becomes knowledge when annotated with criticality, failure behavior, and remediation context.
The spectrum, not the binary¶
In practice, architectural data and architectural knowledge exist on a spectrum. A well-structured model with typed relationships and classification is more knowledge-like than a flat spreadsheet, even if it lacks explicit decision rationale. The spectrum looks like:
graph LR
A["Raw Data<br/><small>Spreadsheets</small>"] --> B["Structured Data<br/><small>CMDB / Catalog</small>"]
B --> C["Typed Model<br/><small>ArchiMate / UML</small>"]
C --> D["Annotated Model<br/><small>+ Decisions<br/>+ Assessments</small>"]
D --> E["Knowledge Graph<br/><small>Inference, Rationale,<br/>Governance Rules</small>"]
style A fill:#f4e8e8,stroke:#7a2d2d
style B fill:#f4ece8,stroke:#7a4a2d
style C fill:#f4f4e8,stroke:#6a6a2d
style D fill:#ecf4e8,stroke:#4a7a2d
style E fill:#e8f4e8,stroke:#2d7a2d
Each step adds semantic richness:
- Spreadsheet → CMDB/Catalog: Adds structure, relationships, and lifecycle management
- CMDB → Typed model: Adds formal element types, relationship semantics, and metamodel constraints
- Typed model → Annotated model: Adds decisions, assessments, quality attributes, and governance context
- Annotated model → Knowledge graph: Adds inference, cross-domain connectivity, queryability, and machine-readability
Most organizations are somewhere in the middle. They have structured data (catalogs, models) but haven't formalized the knowledge layer (decisions, rationale, principles, impact semantics).
Why most EA repositories contain data, not knowledge¶
Several structural factors keep EA repositories at the data level:
Tool design¶
Most EA tools are optimized for drawing and storing models — elements, relationships, views. They provide excellent support for the "what exists" question. They provide weak support for "why it exists" (decision rationale), "what it means" (assessment and classification with evidence), and "what should change" (target state with migration rationale).
The tool shapes the practice. If the tool makes it easy to draw boxes and arrows but hard to record decisions, you get models without rationale.
Incentive structures¶
Architecture data is easy to audit: "do we have a model for every domain?" Architecture knowledge is hard to audit: "do our models contain sufficient rationale to support autonomous decision-making?" Organizations measure what's easy to measure, which incentivizes data collection over knowledge formalization.
The tacit knowledge problem¶
Much architectural knowledge is tacit — it exists in experienced architects' heads as pattern recognition, heuristics, and contextual judgment. Making this explicit requires deliberate effort: writing decision records, documenting principles with rationale, recording assessment criteria. This effort competes with delivery pressure.
Format limitations¶
Even when architects do record knowledge (in Confluence pages, ADR files, meeting notes), it's typically in unstructured natural language disconnected from the model. The decision exists, but it's not linked to the elements it affects. The principle exists, but it's not encoded as a constraint that can be validated. The assessment exists, but it's not queryable.
What a knowledge graph adds¶
A knowledge graph — specifically, an ontology-based architecture knowledge graph — addresses the structural barriers:
Formal semantics for relationships¶
Instead of generic "connects to" links, relationships have types with defined semantics: am:realizes, ad:addresses, arch:governedBy, fina:hasCostModel. These support inference: if A realizes B and B realizes C, then removing A impacts C. This is knowledge encoded in structure.
First-class decision records¶
Architecture decisions aren't separate documents — they're graph entities linked to the elements they affect:
ex:decision-042 a ad:Decision ;
ad:status ad:Accepted ;
ad:rationale "Event-driven integration selected for cross-domain workflows..." ;
ad:addresses ex:force-resilience, ex:force-scalability ;
ad:relatedConcept ex:payment-platform, ex:fulfillment-service ;
ad:consideredOption ex:option-sync-rest, ex:option-shared-db, ex:option-event-driven .
The decision is queryable, linked, and traversable. An AI agent — connected via the MCP server — can answer "why does the payment platform use event-driven architecture?" by traversing the graph.
Encoded governance¶
Principles and constraints become SHACL shapes — executable knowledge:
ex:CrossDomainIntegrationShape a sh:NodeShape ;
sh:targetClass am:ApplicationCollaboration ;
sh:property [
sh:path am:usesPattern ;
sh:hasValue ex:event-driven-pattern ;
sh:message "Cross-domain integrations must use event-driven patterns (Principle P-007)" ;
sh:severity sh:Warning
] .
This transforms a principle from a PDF that nobody reads into a validation rule that runs in CI/CD.
Cross-layer queryability¶
The knowledge graph connects layers that tools keep separate:
# "What capabilities are at risk if we decommission Platform Y?"
PREFIX am: <https://meta.linked.archi/archimate3/onto#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT ?capability ?risk_level WHERE {
ex:platform-y am:realizes ?service .
?service am:realizes ?capability .
FILTER NOT EXISTS {
?alternative am:realizes ?service .
FILTER (?alternative != ex:platform-y)
}
BIND("CRITICAL - single provider" AS ?risk_level)
}
This query encodes knowledge: a capability served by a single provider is at critical risk if that provider is removed. The query itself is a formalization of architectural reasoning.
Practical implications¶
For architecture teams¶
-
Audit your repository honestly. Is it a data store (inventory of what exists) or a knowledge base (interpretive layer that supports decisions)? Most are data stores with aspirations.
-
Start with decisions. The highest-leverage knowledge formalization is recording architecture decisions with rationale, alternatives considered, and links to affected elements. This is the "decision trace" that Forrester identifies as the missing layer.
-
Encode principles as constraints. Every architecture principle that can be expressed as a structural rule should become a SHACL shape. This transforms knowledge from documents into executable governance.
-
Add "why" to classifications. TIME assessments, technology radar positions, and lifecycle states are data without rationale. Add the evidence and reasoning that supports each classification.
-
Connect, don't duplicate. The knowledge isn't in any single model — it's in the connections between models, decisions, assessments, and principles. A knowledge graph makes these connections explicit.
For tool selection¶
- Tools that store models but not decisions are data tools
- Tools that store decisions but don't link them to model elements are document tools
- Tools that link decisions to elements but can't validate principles are modeling tools
- Tools that do all of the above and support inference are knowledge tools
Most organizations need a combination. The question is whether the combination is integrated (shared semantic model) or siloed (separate tools with manual cross-referencing).
For AI enablement¶
AI agents need knowledge, not just data. Giving an AI agent access to an application inventory lets it answer "what applications exist in domain X?" Giving it access to a knowledge graph lets it answer "what should we do about the payment platform?" — because it can traverse from the platform through its capabilities, decisions, assessments, dependencies, and principles to construct a reasoned recommendation.
The Linked.Archi toolbox includes an MCP server that loads the RDF graph into an in-memory SPARQL engine (Oxigraph) and exposes it to AI agents. An architect can ask a natural language question; the agent translates to SPARQL, queries the graph, and returns a grounded answer — not hallucinated from training data, but derived from the actual architecture model.
The difference between a chatbot that retrieves facts and an AI architecture advisor that reasons about your enterprise is the difference between architectural data and architectural knowledge.
The Linked.Archi position¶
Linked.Archi is designed as a knowledge graph, not a data store. The architecture reflects this:
graph TB
subgraph linked_archi["Linked.Archi — Knowledge Architecture"]
direction TB
subgraph knowledge_layer["Knowledge Layer"]
AD["Architecture Decisions<br/><small>ad: — rationale, forces, options</small>"]
SHACL["SHACL Governance<br/><small>Principles as executable rules</small>"]
INF["OWL Inference<br/><small>Derived facts from reasoning</small>"]
end
subgraph semantic_layer["Semantic Layer"]
CORE["arch:core<br/><small>Typed elements, qualified relationships</small>"]
CROSS["Cross-language connectivity<br/><small>ArchiMate + C4 + BPMN + UML + Backstage + EDGY + LeanIX + BMC</small>"]
SPARQL["SPARQL Queryability<br/><small>Traversable by humans and AI</small>"]
end
subgraph data_layer["Data Layer"]
MODELS["Imported Models<br/><small>ArchiMate XML, Backstage YAML, etc.</small>"]
CATALOGS["Service Catalogs<br/><small>Inventories, CMDBs</small>"]
RUNTIME["Runtime Data<br/><small>APM, cost, metrics</small>"]
end
end
data_layer --> semantic_layer --> knowledge_layer
style knowledge_layer fill:#e8f4e8,stroke:#2d7a2d
style semantic_layer fill:#f4f4e8,stroke:#6a6a2d
style data_layer fill:#e8e8f4,stroke:#2d2d7a
arch:core(v0.3.2) provides the semantic foundation — typed elements, qualified relationships, viewpoints, concerns. Aligned with ISO/IEC/IEEE 42010. This is the structural layer that elevates data toward knowledge.- Architecture decisions extension (
ad:) provides first-class decision records linked to affected elements, forces, options, and outcomes. This is the "decision trace" layer that Forrester identifies as the missing piece. - SHACL shapes encode governance principles as executable constraints — per-language validation (relationship validity, required properties) and organizational governance rules. Run via
validate.shin CI/CD. - Cross-language connectivity — nine modeling languages (ArchiMate, BPMN, UML, C4, Backstage, EDGY, LeanIX, BMC, BPMN-Lite) and ten governance frameworks (TOGAF, DoDAF, UAF, ADMIT, TIME, Zachman, EA on a Page, ATAM, 4+1, Platform Design) all share
arch:core, enabling cross-domain reasoning. - SPARQL queryability makes the knowledge traversable — not just by humans reading diagrams, but by AI agents constructing answers via the MCP server.
- Inference (OWL reasoning) derives new knowledge from existing facts — if A realizes B and B is classified as critical, then A is a critical dependency. The TIME framework uses OWL equivalent classes for automatic quadrant classification.
The ecosystem operates as three layers: meta.linked.archi (this repository — shared ontologies and vocabularies), linked.archi (platform toolbox — converters, generators, graph navigator, MCP server for AI agents), and pub.linked.archi (publications and case studies).
The ecosystem doesn't force you to start at the knowledge level. You can begin with data (import an application inventory) and progressively add knowledge (link decisions, encode principles, add assessments). But the architecture is designed for the knowledge end of the spectrum.
Summary¶
| Dimension | Architectural Data | Architectural Knowledge |
|---|---|---|
| Answers | What exists | Why it exists, what it means, what to do |
| Structure | Flat or weakly typed | Semantically rich, typed, inferrable |
| Source | Tools, discovery, manual entry | Architects' interpretation, decisions, experience |
| Durability | Perishable (decays with change) | Durable (principles and rationale outlive systems) |
| AI utility | Retrieval (search, lookup) | Reasoning (impact analysis, recommendations) |
| Governance | Completeness checks | Principle validation, decision consistency |
| Formalization | Spreadsheets, catalogs, models | Knowledge graphs, decision records, SHACL rules |
| Example | "App X uses Java 11" | "App X uses Java 11 because the 2023 platform decision mandated JVM-based runtimes for the payment domain due to library maturity for PCI-DSS compliance" |
The gap between data and knowledge is the gap between an architecture repository that produces reports and one that supports decisions. Closing it requires deliberate formalization: typed relationships, decision records, encoded principles, and cross-domain connectivity.
Most organizations have the data. Few have formalized the knowledge. The ones that do are the ones whose architecture practices scale — to more stakeholders, to AI agents, to governance automation, and to the complexity that modern enterprises demand.
References¶
Foundational concepts¶
- Nonaka, I. & Takeuchi, H. (1995). The Knowledge-Creating Company. Oxford University Press. — The tacit/explicit knowledge distinction applied to organizations
- Ackoff, R.L. (1989). "From Data to Wisdom." Journal of Applied Systems Analysis, 16, 3–9. — The DIKW hierarchy
- Kruchten, P., Lago, P., & van Vliet, H. (2006). "Building Up and Reasoning About Architectural Knowledge." QoSA 2006, LNCS 4214. — Foundational paper on architectural knowledge management
Architecture knowledge management¶
- Jansen, A. & Bosch, J. (2005). "Software Architecture as a Set of Architectural Design Decisions." WICSA 2005. — Decisions as first-class architectural knowledge
- Tang, A., Avgeriou, P., Jansen, A., Capilla, R., & Ali Babar, M. (2010). "A Comparative Study of Architecture Knowledge Management Tools." JSS, 83(3). — Survey of AKM approaches
- Capilla, R., Jansen, A., Tang, A., Avgeriou, P., & Babar, M.A. (2016). "10 Years of Software Architecture Knowledge Management." JSS, 116, 191–205. — Decade retrospective
Industry analysis¶
- Context Graphs Are a Convergence, Not an Invention — Forrester, Apr 2026. Entity graph + decision trace convergence.
- The Dominant AI-Enabled IT Management Platforms Lean Into Context Graphs — Forrester, 2026
- The New Architecture Mandate: Insights from Gartner IOC 2025 — EA Principals Journal
- 5 Things Enterprise Architects Are Doing Wrong — Timo Elliott (summarizing Gartner 2025)
- Siemens STAR — Enterprise Knowledge Graphs in Software Engineering — Springer, 2018
Linked.Archi documentation¶
- What is Linked.Archi? — Project overview, modules, and design principles
- Enterprise Ontologies and the AI Context Problem — The analyst convergence on enterprise ontologies
- Bridging Formal Architecture and Practical Adoption — Model once, view everywhere
- Architecture & Approach — Core thesis and technical architecture
- Toolbox Overview — Converters, generators, MCP server for AI agents