NoOps and the Need for Critical Alerting

critical alerting

NoOps eschews critical alerting at its own peril

Many start-ups’ embrace serverless architectures such as AWS, believing they will be able to adopt NoOps and avoid the need for critical alerting and ITOps. NoOps means no worries about servers as everything is on the cloud and if there are no worries about servers then there is no need to worry about critical alerting. The reality is slightly different. No matter how minimized Ops becomes, there will always be a need for strong incident management applications. The emphasis will simply further push monitoring from an Ops-only role to an important role for everyone on the development team.

What is NoOps and why is there so much criticism?

NoOps defines an IT environment that is so automated and abstracted from the underlying infrastructure that there is no need for a dedicated team to manage Ops in-house. The
two main drivers behind NoOps are increasing IT automation and cloud computing. Even among NoOps critics, NoOps is still seen as a very agile way to approach software development and deployment. IBM, for example, considers it : “the next level of DevOps.”

Much of the criticism seems to focus on a scenario where firms try to adopt zero Ops involvement. While this might be the path of some start-ups, most would argue that this extreme is short sighted.

The main argument is that you are foregoing the expertise of Ops and rewarding programmers who have, as their main objective, the goal of pushing code through as quickly as possible. Further criticism is seen in the suggestion that “great operations engineers are best equipped to help incorporate operational excellence into all practices.”

Well known Ops evangelist Charity Majors takes the zero-Ops mindset to task when she writes that

I’ve lived the other side of this fairytale. I’ve seen what happens when application developers think they don’t have to care about the skills associated with operations engineering. When they forget that no matter how pretty the abstractions are, you’re still dealing with dusty old concepts like “persistent state” and “queries” and “unavailability” and so forth, or when they literally just think they can throw money at a service to make it go faster because that’s totally how services work.

So, start-ups, if you are thinking about going with zero Ops, maybe take some time to read the rest of this article.

Why start-ups love NoOps and serverless

At the 2015 Jenkins User Conference in London the CTO of Choose Digital, Mario Cruz, noted that :

“in the past couple of years [we have seen] where a six-man team can take on an enterprise today. This is because combining cloud and DevOps speeds up the time it takes for developers to act on ideas, while the legacy, on-premise investments of enterprises makes this harder for them to do. While [this] means developers are solely accountable for any mistakes made, the weight of this responsibility tends to ensure the work they do is of a very high standard.”

This ability to embrace lean start-up principles and deploy quickly is highly appealing to the small teams Mario Cruz refers to. However, if you read carefully, you’ll also notice that the Devs he lauds is not a Devs from the traditional DevOps mold. Instead, he is arguing for a Dev team that worries about effective deployment, security, monitoring and QA.

As such, it seems that a more sensible approach to NoOps is one where Ops responsibilities are integrated into the overall workload. In the NoOps environment, the development team may grab code from its GitHub repository and send it to a platform as a service like Heroku and run code on an Amazon EC2 instance. In this case, the Ops team doesn’t stay up at night worrying about servers. However, they have the very real concern of working with Dev and QA teams to improve security, usability, disaster recovery or other necessary operational components.

Importantly, and not to be belittled, is the need to have critical alerting become everyone’s job on the NoOps team.

Why NoOps still needs critical alerting

Thinking about operational quality in terms of “a thing some other team is responsible for is just generally not associated with great outcomes. It leads to software engineers who are less proficient or connected to their outcomes, ops teams who get burned out, and an overall lower quality of software and services that get shipped to customers“.

As such, Dev teams who think that they can avoid critical alerting concerns by adopting a NoOps lifestyle ought to brace themselves for a rude awakening. Instead, Devs need to realize that being on-call needs to be everyone’s job. As the redefined Dev team comes to include newly baptized Ops and QA, being on-call becomes a shared responsibility.

However, critical alerting doesn’t need to be the painful experience of years past. By implementing intelligent alerting with OnPage’s notification platform, Devs can experience intelligent alerting. OnPage enables teams to:

Create groups and escalation policies to ensure alerts get attended to promptly
Enable real-time communication
Use ‘High-Priority’ alerts for critical events
Alert for up to 8 hours
Send attachments with messages to enhance levels of information and knowledge of recipient
Facilitate global-coverage so your start-up team can be alerted no matter where on earth they are located
Create audit trails to see if and when messages are responded to

Conclusion

Severless has the potential to transform how software is deployed. However, critical alerts are not going away anytime soon. Being on-call is as inherent to development as virtualization or logs. The good news is that it doesn’t have to be painful.

To learn more about intelligent critical alerting, contact us.