The question followed me back to GitHub Copilot — a tool I'd been using daily for weeks. And I realised I couldn't answer it there either. I'd been treating Copilot exactly like a very fast junior developer sitting next to me — useful for the repetitive parts, worth checking before committing anything important. I wasn't thinking about what it was doing underneath. I was thinking about whether the output looked right. Then it made that obvious.

I was working on a detailed integration design — drafting sample code to illustrate how two systems would talk to each other. Not production code, but close enough to it. I gave Copilot the context. What came back wasn't a completion of what I'd started. It was a choice. It had picked an integration pattern, made an assumption about error handling, and implemented both before I'd specified either.

The code was good. That was the unsettling part. Not wrong — considered. It had read the context and made a design decision on my behalf. Impressed and unsettled in equal measure. Impressed because it worked. Unsettled because I realised the same question from that demo applied here too: who designed this? In the demo, nobody could answer it. Here, the answer was: the model did. And I had no idea how.

That's when I stopped treating Copilot as autocomplete and started asking the question I should have asked much earlier: what is this thing actually doing underneath? Not the marketing version. The actual mechanism. Because if a tool can make a design decision on my behalf, I need to understand how it's making that decision — not just whether the output looks right.

AI Is Not a Database

The answer, when I went looking for it, was both simpler and more important than I expected. At its core, an LLM — the engine underneath Copilot, ChatGPT, Claude, and most of what we call "AI" today — is a prediction machine. It predicts the next most likely token based on patterns learned from vast amounts of training data. That's the entire mechanism. Advanced pattern completion, operating at a scale and sophistication that produces outputs which feel like reasoning. It is not retrieving facts. It is not searching a database. It is not looking anything up. It is predicting what comes next based on what it has seen before. This distinction sounds academic until you hit the consequences of ignoring it. When I gave Copilot that integration context, it didn't retrieve the correct pattern from a store of known integrations. It predicted the most statistically likely continuation of the code I'd started, given everything it had learned from the training data it was built on. The pattern it chose was the one that fit the context most closely — which, most of the time, is also a good pattern. But not always. And critically, it has no mechanism to know the difference. This is why hallucinations aren't a bug. They're a structural property of how these systems work. A system that predicts the next likely token will occasionally predict something plausible that is also wrong — because plausibility and correctness are not the same thing, and the model has no ground truth to check against. When Copilot makes a design decision confidently and incorrectly, it isn't malfunctioning. It's doing exactly what it was built to do. For an enterprise architect, this reframing changes everything. You stop asking "is the output correct?" as your primary question. You start asking "what was this model trained to be good at, and is this situation inside or outside that range?" Those are different questions. The second one is harder and more useful.

The Spectrum from Assistance to Agency

Once I understood what the model was doing, the next question was harder: where does Copilot sit in the broader landscape of what people are calling "AI"? Because not all AI is the same problem. There's a spectrum — and most enterprise conversations I've been in conflate the ends of it. At one end: scripted workflows and RPA. Rule-based, deterministic, brittle at the edges. Automates the known. Breaks on anything it wasn't designed for. One step up: AI-assisted workflows. Intelligent classification, document routing, smart tagging. This is where most enterprise "AI projects" actually live. The model makes a call — high confidence goes this way, low confidence goes to a human — but within a structured flow. Still largely predictable. Still largely governable with traditional tools. Then there's what Copilot was doing in that moment: operating with enough context to make a choice that wasn't pre-specified. Not following a rule. Exercising something that looked like judgement. This is a different category. And beyond that — which is where the enterprise AI conversation is rapidly heading — are agentic systems. Systems that don't just make a single judgement call in a structured flow. Systems that perceive their environment, build a plan, take a sequence of actions, and adapt when something changes. Systems with goals, not just instructions. I was using a Tier 3 tool — AI-assisted, intelligent, impressive — and I had been thinking about it with Tier 2 mental models. That gap is where most of the confusion in enterprise AI conversations lives. People are deploying tools that operate at one level of the spectrum and governing them as if they were at a different level. Sometimes that means over-governing something simple. More dangerously, it means under-governing something complex. The integration design moment clarified this for me. Copilot wasn't just completing code. It was operating with enough autonomy that I needed to understand its decision-making, not just review its output. That's a different relationship with a tool than I'd had before.

What AI Literacy Actually Means for an Architect

I want to be direct about something: the literacy I'm describing is not a data scientist's literacy. You don't need to understand gradient descent or transformer architecture at an implementation level. That's not the job. Dr. Raj Ramesh, whose framework is shaping this series, uses an analogy I keep coming back to: the tour guide in an international airport. You don't speak every language fluently. But you know enough to point travellers in the right direction, to recognise when something has changed, to translate between what someone needs and what's available. That's the level of literacy an architect needs. Practically, for me, it meant three things. Understanding what the model can and can't do — not in general, but in your specific context. OpenAI's models, which I was primarily working with before broadening out, are extraordinarily capable at code generation, pattern recognition, and synthesis. They are genuinely unreliable at anything requiring real-time information, precise numerical reasoning, or retrieval of specific facts. Knowing which side of that line your use case sits on is a design decision, not a technical detail. Knowing where it fails non-obviously. The obvious failures are easy — the model says something wrong and you catch it. The non-obvious failures are the ones that look right. Confident, fluent, structurally sound — and subtly incorrect in a way that only surfaces downstream. In an integration design, that might mean a pattern that works in most cases but fails under a specific load condition the model's training data underrepresented. The skill is building the review process that catches what the output scan misses. Being able to translate that into design decisions. Not just "AI is unreliable, add a human check." That's not architecture, that's hedging. The actual question is: where in this system does non-determinism create risk, what's the cost of a wrong output at each point, and what's the lightest-weight intervention that addresses that risk without killing the value of using AI in the first place? That third question is the one I'm still developing. It's harder than it looks.

The Eager Intern

The mental model that shifted my relationship with these tools is one I've started calling the Eager Intern. Brilliant. Fast. Genuinely impressive range of knowledge. Works hard, responds immediately, never complains about scope. And occasionally — confidently, fluently, without any visible sign of uncertainty — completely wrong. The right response to an eager intern isn't to distrust everything they produce. It's to understand the shape of their blind spots and build your working relationship around that. You don't ask them to make the final call on something where a confident mistake has a high cost. You do ask them to do the first pass on everything, because their first pass is better than nothing and often better than your first pass. You build in the check at the right moment, not at every moment — because checking everything defeats the purpose. This is how I work with AI tools now. OpenAI for code generation and synthesis, Claude for longer reasoning tasks and writing, Copilot embedded in the flow. Each one has a shape to its reliability. The Eager Intern model stops me from either over-trusting or reflexively dismissing — both of which I've done, and both of which are expensive in their own way. The GPS analogy is worth naming here. GPS didn't digitise paper maps. It didn't take the thing we were already doing and make it faster. It made an entirely different thing possible — real-time routing, live traffic, recalculation when you go wrong. The question for AI isn't "how do I add this to what I already do?" It's "what does this make possible that wasn't possible before?" I'm still working out my answer to that for architecture specifically. But the question has changed the way I approach every engagement.

What I Didn't Know I Didn't Know

Here's the thing about the Copilot moment. It wasn't that I learned something new that day. It's that I realised I had been fluent in the output without being literate in the mechanism. I'd been reviewing AI-generated code the way I review any code — does it work, is it clean, does it follow the pattern. I was evaluating the product without understanding the process. And for most tools, that's fine. You don't need to understand how a database engine works to design a schema. But AI is different, because the failure modes are different. A database fails in ways that are visible and traceable. An AI system fails in ways that can be invisible for months. The literacy changes specific things. The questions I ask in a vendor meeting are different now — I'm asking about training data, about the confidence model, about what happens at the edges of the use case. The risks I flag in a design review are different — I'm looking for places where non-determinism meets high-cost decisions without adequate intervention. The conversations I can have with a data scientist are different — I can engage with the tradeoffs, not just the capabilities. None of that required becoming a data scientist. It required asking one question — what is this thing actually doing — and following it seriously for a few months. I'm still following it. And the further in I go, the more I realise that the literacy question is only the beginning. Once you understand what AI is, the harder problem appears: what does it mean to design a system around something whose behaviour you can't fully predict? How do you draw a diagram for a component that exercises judgement? That question pulled me back to the framework I mentioned in the first post — Dr. Raj Ramesh's seven essential skills for enterprise architects. The first one he identifies is Technology Literacy. I thought I already had it. Turns out I had fluency. Those aren't the same thing.

You've reached the end of the series
More posts coming soon.