Years of gory struggle for robustifying software, countless philosophical discussions, and meditations lead to what I strongly as the essence of robust.
First and foremost to be robust is to be certain - certain that things don't break, that nothing is divided by 0, that integer does not overflow, that method is not called on null pointers, that loads are balanced, that pipes are backpressured, that data does not race, that threads aren't poisoned, that errors have mitigation strategies.
Robustness is gained by trading something of yours in change for certainty. It can be by spending your time, possibilities, or agents. You can wait for a fallible operation to keep retrying until it succeeds. That guarantees success by paying with your time. You can pay by sacrificing options such as how type-narrowing happens in flow-typed programming languages. You can create and spend an agent as a layer of protection that handle errors to make sure your primary function works well - agents such as automated tests or background services.
And the most important thing certainty is to be certainly correct. And to be certain to be correct is first to know.
It does not bubble up
It is an unfortunate fact that by nature robustness does not bubble up. A software system is generally built-in layers that fulfill different purposes. The layer above depends on the layer below. A layer needs the layer beneath to be robust before it can be robust by itself. However robust an app is, it cannot stand on its own without a robust OS. So does robust OS to non-robust hardware. And then a robust foundation does not guarantee a robust upper layer.
Constant effort is required to robustify increasing layers of software.
Changing people's heart > Mechanical hardening
Discussions about robustness are often about hardening techniques, best practices, and methods of identification. It is not the most effective part of the discussion though.
Remember that constant effort is required to achieve robustness. The most important thing is to persuade people into making things robust. This happens before the technical part. In this stage, you need to recruit as much army as possible to be on your side when it comes to making things robust.
Clearing the Fog of war
To be robust is to first know. The opposite would be unknownness. There are a lot of things you don't know would make your software crash for example. I would often forget about integer overflow. Putting a huge allocation in a loop unnecessarily might cause a quadratic amount of wasted memory, then an out-of-memory crash. Beginners and experts fall for those at all times. These are the unknowns - the unpredictables.
But most often these unknowns lies in the externals, things that we don't control. Externals can be from the address space boundary (e.g. as I/O or IPC), from the code control boundary (e.g. third-party-library), or scenario boundary (e.g. the possibility of being DDOSed, force majeur).
Trial and error experiments uncover these unknowns. It maps where mines are in the minefield for others to avoid. With good enough tools (e.g. programming language, framework, platform), you can build a super safe mechanism to retain identifications of errors for others to use. The more traps uncovered the better.
The inevitably annoying aspect of this fog-of-war problem is that we don't know what we don't know. We may invite an expert to scan the field because they are more familiar with that particular area, but then again it is less a matter of expertise and more of a discipline. Again, changing people's heart and mind toward robustness might be a more valuable task for experts rather than deploying them directly to the center of the problem.
Negative Effort - Deidentification
Deliberately obscuring information is not abstraction. If you say otherwise, you are misled and mistaught. Abstraction is a creation of an abstract concept to represent and condense more concrete concept. In an abstraction, concrete information is not completely unmade.
Deliberately obscuring information and deidentification is a reduction to robustness.
Example: Thrown Exception, Deidentification
throw keyword in several programming languages for instance. I would argue that
throw is another billion-dollar mistake. Why? because it de-identify where and what errors would arise. Some languages have checked exceptions, the others are not so resourceful.
How does it de-identify? Take a
try block with 10 lines of code. At a glance, without scrutinizing each line, there are 10 possibilities where an error can be thrown. This is a personal experience, but I have found myself anxious when using languages that have the try-catch mechanism.
Throwing an exception is a neat way to write a green path, sure, but only if the red paths are negligible. Software that relies heavily on exit-on-error and restarts for example is where red paths are negligible.
Softwares that need red paths to be gracefully handled don't have that luxury. These are software that needs to have fallback mechanisms on errors or take different but not less important flow branches on errors. These are the kind of software that leans more, requirement-wise, to a system rather than an app.
Of course, this is not a mere "exception bad" kind of rant. We have solutions and alternatives, although not as popular. In C++ there is the variants type for a start to "communicate" errors. It can be utilized by keeping the discipline of catching exceptions super early on every occasion. Essentially turning unknown errors into known ones. In TypeScript there is fp-ts that can be used the same way. These are libs, which won't replace thrown exception 100%. The rest depends on the language maintainer to make it an official replacement, a hard recommendation to migrate, and a migration step that's incremental. One of my favorites is Rust's way of modeling errors, but again it is a relatively new language that has more freedom in making official choices.
Aptitude in Robustness
Aptitude in robustness can be both developed and found existing in people. The core of the aptitude lies in the skill to identify unknowns. If while working on a project, there's someone who can seem to point out unknown risks and possible risks in a way that surprises people, take that as a sign that that person might be good at dealing with robustness, albeit not observable at a glance and not directly effective.
Risk discovery might sound pessimistic or threatening to our mortal ears. The more effective the discovery, the more intimidating the path forward. It is recommended to both encourage the discoverer to write it down in a more objective manner. The recipient also needs to keep the "I can do everything attitude" despite the newly uncovered landmines. Communicating risks in a quantified manner helps a lot as it gives more insight into how severe it can be to both the project and the executors. What's more to encourage these gifted risk discoverers is to also propose solutions alongside the problem, turning risk discoverers into problem solvers.
To develop aptitude in improving robustness, one would need to be explorative and thorough. Ask why and how perpetually. Scrutinize why and how things are until the essence is known. Dig deep often and one would find that digging deep gets easier over time. Digging deep is a skill on its own.
Not only improving your skill to robustify things, exploration on existing technology to its core will grant you knowledge of the old.
Acknowledge the value of understanding things even when you don't have control over it, even when these are things of the past. Only after that one would find oneself to be standing on the shoulders of the giants.