← All posts
Data fundamentals6 min read

Why Spreadsheets Break at Scale

Spreadsheets are not the problem. The problem is that the business grew and the spreadsheet stayed the same size. At some point the thing that got you here starts costing you more than it saves.

Nobody sets out to run a business on spreadsheets. It just happens gradually, one file at a time, each one solving a specific problem that the official systems didn't quite handle. Then one day you look up and the business is operationally dependent on twelve spreadsheets owned by six different people, and everyone is slightly terrified of what would happen if any of those people left.

Spreadsheets are good tools. This is important to say upfront, because "you should stop using spreadsheets" is advice that tends to irritate people, mostly because it sounds like "you've been doing it wrong." You haven't. Spreadsheets do things that proper databases and business systems genuinely don't do well: fast iteration, flexible formatting, ad-hoc analysis, quick models that you need for one meeting and then never again. Excel built the careers of a generation of business people for good reasons.

The problem isn't spreadsheets. The problem is what happens when spreadsheets quietly become the system of record for things that need something more.

When spreadsheets become infrastructure

There's a specific moment when a spreadsheet crosses a line. It stops being a tool someone uses for analysis and starts being the thing that the business runs on. At that point, several things become true simultaneously.

More than one person needs to edit it. So it gets shared. Then someone edits the wrong version. Then there are two versions, and nobody is sure which one is current.

The data in it comes from other systems, entered manually. Which means there's a human being whose job involves copying numbers from one screen to another at some frequency. That person is also the single point of failure. If they're sick, on holiday, or just busy, the spreadsheet is stale and everyone is working from old numbers.

Other people's work depends on it. Decisions get made based on it. Reports pull from it. Then one formula somewhere in the middle breaks, silently, and the output is wrong for three weeks before anyone notices.

The formula problem

One of the most underappreciated risks in a business-critical spreadsheet is what happens to formulas over time.

A formula written six months ago encodes an assumption about how the data is structured. When the structure of the data changes (a new column gets added, a source system changes a field name, someone rearranges rows for readability) the formula may still run without producing an error. It just quietly produces the wrong answer.

Spreadsheets don't have tests. There's no automated check that says "this formula assumed the revenue column was column D, and now it's column F." The error is invisible unless someone has the original numbers memorised well enough to notice the drift.

In a proper database or data pipeline, schema changes produce errors. Loud, visible, fixable errors. In a spreadsheet, they produce silent wrong answers. That's a meaningful difference when the numbers are informing decisions.

The ownership problem

Most critical business spreadsheets are owned, in practice, by one person. That person built them, understands their quirks, knows which columns not to touch, and has a mental model of how the whole thing fits together. Everyone else just uses the output.

When that person leaves, one of two things happens. Either someone new inherits the file and slowly learns its rules through trial and error (and error and error). Or the file gets replaced with a new one, built by the new person, with different logic, and the historical continuity is lost.

Neither is a good outcome for something the business depends on. And both are almost inevitable, because the knowledge required to maintain a complex spreadsheet is not stored in the spreadsheet itself. It's stored in one person's head.

What the threshold actually looks like

Not every spreadsheet needs to be replaced with something more robust. Plenty of them shouldn't be. The question is whether a given file has crossed the line from "useful tool" to "fragile infrastructure."

A few questions worth asking: Does anyone's daily work depend on this spreadsheet being current and correct? Would a three-week-old version cause a material problem? Does more than one person edit it? Is data entered into it manually from another system, regularly? Would it be hard to reconstruct if it were accidentally deleted or corrupted?

If the answer to most of those is yes, you have infrastructure dressed as a spreadsheet.

The fix isn't always a large technology project. Sometimes it's connecting the underlying data sources through a proper pipeline so the spreadsheet becomes a consumption layer, not a data store. The structure you actually want is: data lives in systems that own it, flows automatically into a clean data layer, and the spreadsheet or reporting tool queries that layer rather than storing the data itself.

That way the spreadsheet can still exist, for the flexibility and familiarity that makes it useful. But the data underneath it is reliable, versioned, and not owned by one person's local file.

If you're not sure where your spreadsheets sit on this spectrum, a data health audit usually surfaces the answer very quickly.

Work with us

If this sounds familiar, start with the 7-day Mini Proof-of-Work. We’ll test one narrow use case on real data and show you what a full build would involve.

Book the 7-day Mini Proof-of-Work