DevOps & Responsibiliity

Responsibility is greater than Accountability

Posted by Cads on 19th Apr 2018

It’s not uncommon to say that Agile has let us down. Uncle Bob has been talking about this for at least 2 years. (https://www.youtube.com/watch?v=ecIWPzGEbFc). Andy Hunt has been saying it for more than 5 years, (the end of agile) and he’s one of the founders of the Agile Manifesto. But there’s a clear feeling that the Agile Manifesto isn’t unreasonable, that Agile is an improvement on what went before (Waterfall). So what’s the reasoning? Why does it look like I am jumping on a conversation that has already started and has been done to death?

Partly because the fundamental flaw has passed to the new successor, Continuous Delivery — of which those of us in the SRE/DevOps world are in great part exponents. “What is this ‘fundamental flaw’?”, I hear you ask. I believe that it is one of semantics.

Some History — Agile is/was a way to better bring the Development and the Business practices closer together. A way to solve the problems of the widely adopted Waterfall model. Quick note here, the first formal description of Waterfall was by Winston W. Royce and it was presented as an example of a flawed non-working model! — But anyway. Agile was supposed to describe a development model where requirements and implementations evolved by collaboration. This collaboration was supposed to be across cross-functional teams and the customer/end-user. (Sounds very DevOps, don’t it!)

What this practically translated to was a set of practices where a product owner (customer representative) became in charge of how software development is actually accomplished. The business doesn’t understand what developers do, though. And — key point — shouldn’t! It’s not their job or remit or training to understand it.

Agile encourages adaptive planning, evolutionary development, early delivery, and continuous improvement, and it encourages rapid and flexible response to change (Wikipedia). This is in direct competition with the deadlines that are (arbitrarily) set by business, and so the business gets its hands on the planning, feedback and improvement phases. Because of the lack of understanding of the development process, we enter the familiar world of WaterScrumFall (Jez Humble) alongside slow or non-existent feedback loops, and the overvalue placed on feature development (ooh shiny!) rather than iterative improvement.

But this seems wrong. The Business understands and is on board with Agile — and anyway, what has this got to do with Continuous Delivery? The business should be aware of the feedback and the scope etc (Ah! but are Sales?) and Continuous Delivery doesn’t have this problem. LEAN, Kanban etc. all espouse the delivery of code as soon as possible in as continuous a manner as possible. Problem solved, right?

Well not really. At the heart of all the Software Development models, dating back to Waterfall as well (V Model, Boehm’s Spiral, RUP or UP), is a conflict with team organization. This organization usually splits into “Planners” “Builders” “Testers” and “Maintainers”. Irrespective of the actual methodology used, most companies fall into this pattern. And why not — it’s a great way to figure out who sits where, who has seniority, what the pay scales are, where the risk sits and, and here’s the rub, who is ACCOUNTABLE for which parts of the model.

And that’s the fundamental issue. The word “accountable”. We can all agree, I am fairly confident, that we need to know who does what. When a system goes down, it’s nice to know (sorry — vitally important to know) who to page. Rob Ewaschuk’s Philosophy on Alerting google doc became de rigueur and is now a chapter in the SRE book — Site Reliability Engineering: How Google Runs Production Systems.

This requirement is not a bad requirement. It’s a GOOD requirement. It’s probably the thing that keeps a lot of us sane — a concrete post in the sand that we can nail our colors to and point out as “The Right Thing To Do™”. But I posit that this word, this desire for “accountability”, is contrary to how we believe software should work, and leads us to some poor practical implementations.

When we say “You are accountable for that”, what we are inherently saying is that we want someone to blame. In the first definition of accountable on google, the overwhelming connotation of the definition is one of negativity and blame. There are 4 synonyms listed — 3 of which have negative connotations. In Webster, it’s a lot more neutral, but still the example is given as “held her accountable for the damage”. When was the last time that you heard someone use “Oh, I am accountable for that amazing cake” ? Or “Yes, that vacation time that you just got — I’m accountable for that!”.

It is truly important that we have organizational units that mirror the functions that they best carry out. It doesn’t make sense to have a team of plumbers do the wiring in a house for example. It does, however, make very good sense to have a person who understands plumbing work with the electricians so that wires don’t get routed where water can pour onto them and cause problems.

So how do we go about getting a set of org units that can do the work, and when things go wrong, respond? It’s time to look at the big bad RACI chart. The RACI model is Responsible, Accountable, Consulted, Informed. Now, this suggests a level of hierarchy that I think we have started to see is unwieldy in large part. BUT, that doesn’t mean that there isn’t something to learn. In RACI, Responsible means “Those who do the work to achieve the task”. Well looky what we have here.

What else does Responsible mean? Google says “having an obligation to do something” “being the primary cause of something and so able to be blamed or credited for it” (emphasis mine), “involving important duties, independent decision making or control over others”.

Webster uses “a : able to answer for one’s conduct and obligations : trustworthy b : able to choose for oneself between right and wrong" within the definitions. And I ask you, which would you rather have : “a responsible president” or “an accountable president”?

So how does Reponsibility manifest itself in practice?

Firstly when you start using the term “responsible” around code, you start talking about “ownership” and the idea that the developer thinks about how the code is written, deployed, used, maintained.

Secondly you apply a set of connotations that are inherently positive. When was the last time you heard “Ooh that’s an accountable eagle scout?”. What we hear is “Oh, my sister’s kids? They’re really responsible!”

Thirdly, people understand that it is their responsibility to look after the code, to look after the service that they are developing.

Fourthly, and probably not lastly, when we are responsible for systems or code, we make sure that we hand this off properly; we are responsible for the documentation, the training etc. as well as the writing of the service.

In practical terms, a responsible engineer does their part well, and the culture of responsibility enables them to look beyond their silo or wall to make sure that the code is working and is deployable and maintainable.

Continuous Delivery doesn’t do that inherently. The edge between the organizational unit and the development culture falls down here. In most organizations, the team is satisfied once the feature is delivered. They no longer pay attention.

A Responsible team knows that it has external focus. It knows and cares for and maintains its services. Responsible teams design for the future. They design for resilience, flexibility and robustness. Accountable teams design hard limits and self-protection.

“You made a bad call and so I rejected it” said no responsible team ever.

I received this call, tried to do this and logged the error — here it is” — That’s a responsible service.

At the heart of this, is that even in a blameless culture, “accountability” is the realm of failure. Responsibility is a way of moving toward a culture of success where success is rewarded inherently, and blame is not allocated. Silos are broken down because we are all responsible for X service on Y servers.

The truth of the matter is that we are accountable for our failures, but responsible for our successes.