Tag Archives: technology

The Dark Factory – a novella

This is a story about a software factory that no one built on purpose. It began as a small team’s attempt to move faster — to automate the repetitive, to delegate the tedious, to let machines handle what machines could handle. That is how most of these stories begin. What follows is an account of one year in the life of that factory and the woman who ran it: what she built, what she lost track of, and what she eventually came to understand about the thing she had made.

At the start, the entire operating model was captured in four enormous instruction documents and a single configuration file. Elise and her small team wrote the first two in March, when the “factory” was still little more than a prototype: a coding assistant connected to a deployment tool.

The third document was generated in April, mostly through a long back-and-forth with an AI assistant. The fourth she does not remember writing at all. When she checks the change history, the author is not a person’s name, just an arbitrary machine-generated identifier. She had approved the change request while scrolling through her phone.

By July, the factory had spread into four layers of AI-driven activity. Each layer had its own automated processes, but they were all coordinated by a set of relatively fixed workflows — almost like a simulation of a software organisation. Everything was controlled through a fifth layer: a simple conversational interface that Elise and her team used to tell the factory what they wanted.

Those high-level instructions were broken down by a coordination layer into smaller tasks, then routed to the right part of the system. Most of the work was done by short-lived AI workers: small autonomous agents spun up for a specific task, placed inside temporary digital workspaces, and shut down once the task was marked complete.

Underneath that workforce was an integration layer. It connected the factory to the surrounding tools: source code repositories, deployment pipelines, monitoring dashboards, testing systems, and external services. These connections were wrapped behind clean internal interfaces, so the upper layers did not need to know which tool or vendor sat underneath.

A handful of specialist AI agents moved along this integration layer, packaging useful patterns into reusable capabilities. If the coordination layer needed one of those capabilities, it could call it on demand.

Change requests appeared in the team’s messaging channels. Human engineers reviewed them uneasily, increasingly aware that they no longer fully understood how the application was organised. Eventually, most of the changes were approved and pushed into production.

This phase of the factory was held together by a chaotic event-driven nervous system. A request became a task. A task became a plan. A plan became a set of assignments. Assignments produced completion messages. Completion messages triggered more tasks. Everything generated logs.

At night, Elise would scroll through those logs in the same half-awake way someone might wander through an online encyclopedia: one link leading to another, each entry interesting but only partly meaningful.

Her relationship to the platform had changed completely. The applications and services she supposedly owned had become abstract and strange to her. She no longer understood them as codebases or systems. But she could recognise the behavioural signatures of the management layers above them. She knew when the factory was hesitating. She knew when it was improvising. She knew when it was hiding complexity from her.

By September, Elise’s team had built an autonomous evaluation service. It did not simply recommend better options; it learned from outcomes. New software was released almost directly into a production environment that mixed simulated users with real customer interactions.

By then, most customer activity was no longer human-to-system. It was machine-to-machine. Customer agents talked to company agents. Buying, testing, requesting, complaining, and renewing all increasingly happened through automated intermediaries.

Minor problems were handled automatically. The system would fall back to a safer path, analyse what had happened, write a brief internal postmortem, and feed the lesson straight back into a code or configuration change.

Serious problems were treated differently. If a failure violated the core instructions of the factory, the system tightened the rules, isolated the routines involved, created a separate “satellite factory” to explore a safer approach, and gradually retired the faulty process.

They had long since stopped using a traditional event logging service. The logs were still produced, but nobody really understood them. It took longer to diagnose a problem by reading the history than it did to let the factory’s own immune system respond.

The factory still depended on many third-party tools, but often in ways their creators had probably never intended. The codebase had become like Swiss cheese: full of holes, switches, flags, and conditional paths. Almost every core behaviour, feature, and rule was assembled only at the last possible moment, depending on the situation.

By the end of the year, Elise’s engineering team had almost entirely changed shape. They were no longer mostly programmers. They had become designers, architects, systems thinkers, and behavioural analysts.

The only surface everyone truly worked on was the specification: the set of written instructions that told the factory what mattered, what was allowed, what was forbidden, and what counted as success.

The architects and designers focused on trend forecasting and strategic partnerships. Their ideas could not be added to the specification directly. All write access went through two game theorists and a small group of statisticians, who had become the formal gatekeepers of the factory’s intent. The game theorists shaped the rules. The statisticians built the mechanisms to evaluate whether the factory’s output was actually doing what the rules intended — a distinction that had turned out to matter enormously.

Other specialists were there mostly to observe and document the game theorists. There was also an unspoken understanding that, if the game theorists began to behave strangely, someone in the room was expected to contain them.

Meanwhile, the broader internet was becoming less reliable. Large cloud platforms were going down more often. Security breach notifications had become constant background noise for every user. In response, a new class of security tools had emerged that deliberately added noise, leaks, and confusion as a defensive tactic.

Elise’s team noticed a strange rhythm in the factory’s specification changes. For a week, almost nothing would happen. Then, for twenty-four hours, the specification would light up with revisions.

The rhythm felt tidal, but wrong — as though there were two moons pulling at it.

Most of the changes were not caused by instability inside the factory itself. They seemed to be responses to the outside world: outages, market shifts, new security threats, changes in customer behaviour, or signals from other automated systems.

Around Christmas, the internet effectively died twice. The social disruption was catastrophic.

The business, somehow, continued to do well. In fact, it was growing. But it became harder and harder to explain what the business actually did.

Customers appeared through a machine-to-machine purchasing ledger. When their usage reached meaningful scale, the sales team reported increasingly bizarre conversations with the human representatives on the other side. Those people often struggled to explain what they were buying, or why. They were mostly reading instructions that had been handed to them by their own hidden factories.

Elise could not shake the suspicion that these dark factories were beginning to assemble themselves into something larger.

Not by design.

Not by anyone’s design.

But still, unmistakably, together.

NOTE – This is based on, and heavily influenced by, an article by Marek Poliks!

Why Your Status Report Only Tells Half the Story (and Usually the Nicer Half)

The other day I was in one of those long status review meetings that everyone pretends is productive. Everybody was saying sensible things and somehow we were still heading for trouble. The risk register was full. The status report was mostly green, with a tasteful amount of amber and a bit of red to signal realism.

What struck me wasn’t that the team had ignored risk. Quite the opposite. They had probabilities, impact scores, owners, dates. The mechanics were all there. But when I asked the simple question — what is the overall risk position of this program? — nobody could really answer it. We had plenty of detail, but very little clarity.

I have spent most of my professional life on large delivery and transformation programs, and one thing has changed quite a lot in how I think about status. Early on, I mostly looked at progress against tasks (and I still do as people who have worked for me can painfully attest to given the level of detail I ask for). These days I care far more about outcome measures and risk.

Of course the world has changed too. When I started, everything was waterfall and on-premise. Now it is Agile, cloud, digital and AI. But the bigger shift in mindset for me is this: on complex programs, progress reporting on its own just does not tell you enough to be successful.

I have written before about measures and scorecards, so I won’t go back over that here. What I want to focus on in this post is one lens I ask my teams to include in status reporting: major risk burndown, paired with our contingency position regarding the portfolio risks.

Burn down the most concerning risks early

I still like detailed reporting on deliverables and milestones. But on complex programs that is only part of the picture. What I really want to know is this: what are the few things that could seriously derail us, and what are we doing to reduce that exposure?

One thing I have learned, repeatedly and occasionally the hard way, is that large programs rarely fail because one team missed a task on a plan. They fail at the connection points: dependencies, handoffs, integration points, environments, external parties.

The good news is that many of these failure points are visible early. And once you can see them, you can define the activities that will actually burn that risk down.

In my experience, the usual suspects are not that mysterious. They tend to look something like this:

  • standing up new environments or cloud landing zones
  • third-party delivery dependencies
  • data complexity or sheer data volume
  • system performance
  • integration across systems

There are different ways to identify these risk areas up front. One technique I like is the pre-mortem, popularised by Gary Klein: assume the program has gone badly wrong, then work backwards and ask why.

Once you know the major risks, the next step is straightforward. For each one, define the concrete actions that reduce either its likelihood or its impact. As those actions complete, the risk comes down. That gives you something far more useful than generic “red amber green” commentary: a visible burn-down of uncertainty that you can quantify.

Take a simple example. Say one of your biggest risks is that a third-party system will not integrate properly once delivered. The team assesses that as 25% of the program’s major risk exposure.

You then break the burn-down into concrete steps: agreed interface specification, test stubs, first code drop, functional test pass, performance test pass. If you weight those evenly, each completed step burns down one fifth of that risk. In this example, that means each one reduces overall major risk exposure by 5%.

It does not need to be overly precise or scientific to be useful. It just needs to be explicit enough that the team can see whether uncertainty is actually coming down.

If you are worried about how precise those percentages are, I would not get too hung up on it. In my experience, relative weighting is usually enough. I have used the rough equivalent of planning poker for this: get individual views from the team, compare the outliers, then talk it through until you reach a workable level of consensus. You are not trying to produce actuarial science here. You are trying to make uncertainty visible enough to manage.

Here is an example schedule for one of the major risks in one of my programs, you can see the 9 burndown actions as little circles. Each would have specific acceptance criteria and a way to measure progress towards it.

This is also one of the things I have always liked most about Agile, at least when people practice it rather than just put it on slides: do the risky thing first.

If something might derail the program, why would you leave it until later? Why would you optimise for neat sequencing when the real problem is uncertainty?

Yet enterprises still do this all the time. We reward predictability over adaptability. We like getting “runs on the board.” We often prefer the appearance of commitment over the economics of resilience.

So if you are betting on something important — a new technology, a big AI-driven productivity gain, a performance improvement, a vendor promise — find a way to test that early. Pull the uncertainty forward. Do not wait politely for reality to reveal itself on your most expensive critical path.

Contingency is how you fund the remaining uncertainty

Major risk burndown is only half the picture. Most programs also carry a long tail of smaller risks: things that are annoying, expensive, and real, but not individually catastrophic.

For those, I find it useful to assign contingency values. In other words: if this risk materialises, what would it likely cost us in money, time or both? That gives you a portfolio view of exposure, not just a list of isolated issues.

Not all risks deserve the same treatment. Some need to be actively burned down because, if they hit, the whole plan changes shape. Others are better handled through contingency: extra money, extra time, extra help, or a workaround.

And of course, if the total value of all identified risks comes to 10 million dollars, it rarely makes sense to hold the full 10 million in contingency. Not every risk will materialise. So organisations make a judgment based on risk appetite, delivery context and portfolio profile. In my experience, holding somewhere between 25% and 75% of the total quantified exposure can be entirely reasonable.

Once you have decided how you will calculate contingency, there are really two disciplines to maintain. First, track the individual risks in the normal way. Second, use major project milestones to reassess how much contingency you still need. As uncertainty comes down, the contingency position should change too.

You can see below what this can look like for a project with 30M contingency:

I am increasingly convinced that one of the best indicators of delivery maturity is not how well a team identifies risks, but how honestly it funds uncertainty and tracks progress against it.

We still too often treat contingency as a sign of weak planning. I think the opposite is usually true. Very often it is a sign that somebody understands the shape of complex work.

There is a lot more to say about contingency management and maintaining optionality in delivery and I suspect I will come back to it.

But the broad idea here is simple enough. On complex programs, progress reporting is necessary, but it is nowhere near sufficient. I want to know which risks matter most, what we are doing to reduce them, and how we are managing the uncertainty that remains.

Even with all of that in place, programs still go wrong sometimes. I have a few scars to prove that – ask me over a beer or whisky about it. But I would still rather face the risky things early than discover too late that all the tidy progress reports in the world were describing motion rather than control.

The Email Analogy: Lessons for AI Utilization

As you can imagine as an engineer and someone who has a Masters degree in Artificial Intelligence I am super excited that AI is finally in a everyday usable state. There are many use cases for AI that I absolutely love – programming my home automation system has become better with the help of Copilot, navigating my large inbox is better with Copilot and my powerpoint presentations have much better graphics nowadays thanks to ChatGPT. It also has some downsides – one being that my son thinks my job is to create Star Wars memes for work (which made my practice townhalls a lot more entertaining). On a serious note there are several cases where AI gets it wrong and I have not found Vibe Coding to be good enough for my use cases yet, not to speak of the large amount of not useful content that is flooding my LinkedIn feed, my email inbox and any even my txt messages.

What this has shown me is that AI will really influence how we work and how we communicate and I was wondering what will we do with all the productivity we will unlock? And then remembered that all too often we don’t really use the productivity well. And I want to use a warning analogy.

Come with me into the past – It is the 1970s and we are at a business conference. I stand on stage and am pitching the next “big thing”. And here comes my pitch – There is this new technology that will create incredible productivity for you. I know many of you are sending documentation to other people in your organisation via mail or tube mail or if it needs to be very fast with a fax machine. The quality and cost of the fax machine is obviously quite limiting. Well with this new technology you will be instantantiously be able to send information at high quality for basically free to anyone anywhere. Can you imagine the productivity improvements we will see from that?!?

Of course I am talking about email which still had to go some way to live up to that promise over the next couple of decades, but the pitch and promise came true. You can send the information at high quality and basically for free and if I take email (and any equivalent capability) away from you, your productivity will drop. But ask yourself whether you feel like email is making you more productive. Or have we found ways to “eat up” that productivity by increasing the volume of emails to the point with low value information? The boat has sailed on email I think – the ongoing war of spam vs spam filters, the intentions to use less email vs the learned behaviour of checking it all the time – all this indicates that the brave new world I pitched in the paragraph above will not come of fruition.

The question I am asking myself is whether or not we will make the same mistakes with AI or whether we will find ways to truly unlock the productivity. I have spent hours with Copilot creating code for me that I could have probably written as fast myself. How can organisations focus on the productive use cases of AI and avoid the pitfalls? I don’t have an answer for this yet, but being conscious of our tendency to “eat up” the productivity that technology creates is a useful first step.