I’ve been thinking a lot about risk and redundancy this week (in the engineering rather than the employment sense of the word—that is, using more components or processes than is strictly necessary, in order to provide a contingency against the non-performance of one or more of them).
This started with the collapse of Silicon Valley Bank at the end of last week: I’m on a couple of WhatsApp groups that were alight with founders, investors and executives discussing the implications. Some had everything in SVB and were facing inability to meet payroll and other expenses. Others, including me, were spared that direct threat but were in the position of having everything held with a single, different bank and contemplating what we would do if it were our institution on the block. There were some outliers: an entrepreneur who maintains multiple business accounts with different banks in case of loss of access to one; an experienced CFO who advocated systematically sweeping cash balances in excess of the insured limit over to other accounts so all are within the FSCS, FDIC or similar limits according to where you bank. (As I run a small business, exceeding the FSCS limit would be a nice problem to have.) But people do lose access to banking for all sorts of reasons, not just institutional collapse, and not all covered by insurance. While the probability of a loss of access might be small, the impact would be extremely serious. It set me thinking about how much redundancy needs to be built into any business process against those sorts of low probability events/single points of failure. Once I began considering this, I kept seeing it in different areas.
For example, one of my clients was supposed to be launching a product this week using a third party platform, but we’ve been seeing an error message since Monday and the platform’s support team has been unresponsive. Thankfully we found a workaround using a different service, at the expense of some duplication of effort. But it was a near miss: had it been a single point of failure, it would have compromised other plans.
On a more everyday level, the external hard drive that I use with Time Machine has given up the ghost. On the walk over to the pub with a friend and neighbour who is a hardware engineer, I had a conversation about the necessity of replacing it. My business files are stored locally and in a cloud service. So it would take two points of failure to create a problem: the local device and the cloud service itself. Was a local backup really necessary? My friend’s very sensible perspective was to ask about the cost of an additional level of redundancy versus the cost/impact of a low probability but event, which struck me as a good framing. In the example above, the cost of a couple of hours of duplicated work was far less than that of a compromised product launch. In the case of backups, the cost of a new home office NAS was less than a couple of hours of billable work, set against hours or days of lost time in the event of two serious failures.
Reviewing how I do business and where I could build more redundancy into those systems took an hour or two of my week, but I’ve been able to take specific steps across finances, information storage and a couple of other areas to improve my confidence in them. Less satisfactorily, it highlighted one major SaaS-based product that doesn’t have any native customer-facing backup function at all, which is a worry given it is an important part of my setup. But it has prompted me to start looking at options to reduce that risk.