2022-06-03

Bare minimum of an easy to manage system

In active software development, every day is a fight between getting values in versus preventing spaghettification of the software and the business. Especially when requirements are not set in stone, the software must change over time. Changing software is easy, keeping it sane is not. To keep software sane, one needs to put everything in one's head to know the impact of a change to it before it happens. The bigger the software, the more challenging it is to think holistically. So, at one point, a growing software is split into many components that interact with each other. In other words, a system.

Name - what you intend something to be

There is a popular saying in software development: "naming is hard". This is true because giving a name to a thing is intending what it is gonna be. Sometimes a name is just random so that you need to give it meaning, but usually, when it sticks, it sticks.

A name refers to a thing. Referring to a thing keeps it close to one's mind. The phrase "Google it" is now synonymous with "Search the net for it". It is amazing though that while Google has a huge set of other products, it can keep its name unambiguous.

Talking about ambiguity, a name for a system or a component cannot be ambiguous. A name is almost like a spell to hold an abstract thing in existence. An ambiguous name seems to just not work properly as if destabilizing the existence and the definition of the entity it's describing.

On one occasion the adjacent teams I work with use the term "download" to refer to both the operation - that is the streaming of bytes into a machine - and the result of the operation. Internally my team uses separate names to refer to these separate things, but we felt difficult using for external use. This causes a lot of misunderstanding and miscommunication, leading to that part of the software having a stunted development. Over time, I realized that exposing those entities as business objects and educating the adjacent team about those entities are more beneficial in the long run rather than using a short word to describe complex things.

I learned to be verbose and accurate rather than short and easy in naming because of that one reason. You simply can't expect to use a short ambiguous name to contain the existence of a big thing.

So what about a name like "Google"? Well, there's work done to put the meaning of the name into people's heads. It's years getting internet search satisfying for many people and keeping it close to its entry-point. That's how much it takes to shape a brand.

Who Are The Actors

The next thing to know is who is actively involved in the story. Let's call these actors, those who act. A bank has customers that store their money, the government that regulates, and a lot of internal actors. The bank itself receives money and submits to regulations. Actors interact.

A system is also an actor, a system can have a component that's also a system itself, which is an actor itself. It is not rare to see subgroups in an organization having conflicts of interest and acting against each other. These are separate actors.

In order to move things strategically, or in another phrase "get things done", one would usually see the actors involved and put an actor among them, or in another phrase "dominate the market".

Spec and Impl

When you want to use or change a system, you need to understand it. You'll need to be able to refer to the actors involved. Then you need to know first, "what are they and what they do", and second, "how they are doing things". In software, these two are called the specification and the implementation.

When you put two cats together, they either fight, be curious about each other, or one dominates the other. What happens after putting two cats is predictable because you know what they do. You know the cats' specifications. That's what they do. That's pretty much how you should see actors. If you put two actors together, you'll have some intuition on how the two could relate and interact. That's how important it is to know the specifications of the actors. In a way, It is similar to how you can guess a business flow by knowing the database table schemas, but not the other way.

Actors vs Data

In software, code can represent both things that act and don't. The former is actor, whereas the latter is what I often simply call data. It is crucial to differentiate actors from data. I emphasize this, especially to people who design software out there.

Data is a lot simpler, so it is recommended to minimize the number of actors by identifying which can be written as data. Software is just bytes executed by a computer. An actor is only a slice of these bytes. The root of the actor's agency lies in the computer's execution. Constructs like class, event, channels, threads, and process are just an abstraction to enable the existence and identifiability of actors.

In some programming languages, data is created with the same syntax. This is confusing for programmers as they can inadvertently convert data into actors and make things complicated. I'm talking about the "class" keyword.

However, don't treat "data is a lot simpler" dogmatically. While actors are harder to manage, in some cases it is imperative to have actors as the main subject of the software. In online games, the same player is actually represented by clones of different actors living in other players' machines that synchronize at the interval. Talking about serializing and cloning actors-

Simple and Robust Lifecycle

A complex system can actually be a simple combination of a bunch of simple but different components. Let us examine two examples below:

A web server is comprised of a component that listens to a port and polls requests, and the component that handles the actual request. These are two different components that do not care about what the other counterpart is doing other than agreeing that the poller sends some bytes to the handler.

A multitasking OS has a context switcher whose task is to just store a process or a thread's states and restore others' state. "Just" is an oversimplification, but the point is that the context switcher does one thing and doesn't care about anything else.

If you look carefully at both examples, both have actors working on a different layer. The interface between the layers is designed in a high-cohesion low-coupling manner. It is a very important principle in system design.

One actor, one layer, one loop

In an oversimplified form of the entity component system, the main game loop runs all systems. The main game loop is one layer. Below that layer is another layer where each system runs its own entities. From the perspective of a game developer, writing a game script feels a lot like piggybacking on the main loop all the time. This is actually the game script interacting with the main loop, actor to actor.

It is a reasonable case of having one loop on each layer. Whenever resource permits, use this principle to simplify an actor. The next trick is to design the procedure inside the loop so that it forms a converging tree. A control flow in the form of a tree (instead of a mesh) reduces complexity by a lot. The convergence, usually in the form of all branches returning the same data structure at the same point, helps the actor's implementor track which path is completely implemented in which path is not. This robustifies the lifecycle by a lot.

Robust

Regardless it is possible or not to implement one loop and a converging tree for one actor, the important thing for an actor is to have a robust lifecycle. That means reducing undefined or unobservable behaviors to zero. How can you be safe when you fully depend on a system that's not robust?

Maintenance Hatch, Emergency Pipe, and Killswitch

Now that you have a system with actors that you can name, you can understand, and you can trust, you have a system that runs well. But those are not all that's required to have a manageable system. Not all actors are made equal. Some actors are burrowed deep in the system, hard to observe, and hard to touch. Those are usually the hardest to change.

Take an application that self-update for example. A self-update works by checking whether there is a newer version, getting the newer version, and replacing the now with the new. How do you test the self-update component of the newest version of the application? There's none unless you cheat. You need a maintenance hatch.

These deep actors need a maintenance hatch so that the maintainer can monitor what is happening inside without having to pierce the thick outer layer that covers it. If it has circuits that are being operated by the outer layer actors, the maintainer must be able to override the control and push the buttons manually. Maintenance hatch complements with the previously mentioned good quality of actors. If the maintainer can reach the actor but does not know what it does nor what will happen if they press a button, I guarantee that maintenance won't be fun.

A system designer must also think carefully about how to design the coupling between actors, especially on the critical ones. Changing the circuitry of a critical actor might produce unfortunate effects therefore it is wise to prepare emergency pipes before making the change. Prepare for fallbacks and strategies to counter inevitable breaking changes. Make redundant backups.

The last one is a kill switch. However sure you are with a decision it is better to have a way to cancel it. Especially if your decision involves an intricate system. When a mistake is done you'll be happy you have a kill switch rather than having to undo the consequences that might cost exponentially compared to if you're not making the decision in the first place. The more dangerous the system the better it is to have a kill switch.

Again, if the killswitch is not robust, all you can do is pray.