Day[8/100] #100DaysOfCloud – Jonnychipz – DevOps Practicing Kaizen, the Art of Continuous Improvement

Kaizen - Wikipedia

The bank holiday weekend is a distant memory, the kids are on a count down back to school, with work and deliveries of school coats and new school shirts and trousers I’m in the midst of customer calls, barking dogs and kids wanting to get my attention to pry the last few pound coins out of my wallet so they can go gallivanting around the streets with their friends! I’m sure this is a scene most of us are more than familiar with by now!

The furious muting of the Teams call to shout at the dog whilst trying to keep on top of important work related calls! The joys of home working as sponsored by COVID!

So yeah, this was how my day began…… mindful that I’m edging closer to the end of what has been a fantastic first read of the DevOps for Dummies by Emily Freeman, I actually couldn’t wait to pick it up again at an opportune moment.

Section 4 – of the book is really the heart of what it means to have a DevOps mindset, focussing on the Art of practicing Continuous Improvement – Kaizen. I first got exposed to these methodologies around about 20 years ago when I worked for a local manufacturing business, and I have been quiet adept at integrating these learnings and experiences into my daily working life throughout the years, but now I’m going to see how this works in the DevOps world!

This image has an empty alt attribute; its file name is image-66.png

DevOps for Dummies (Emily Freeman) – Section 4 – Chapters 16-18.

What have I learned?

Chapter 16 – Embracing Failure Successfully

This chapter is really emphasising that failure is simply put, not something that should be ‘avoided at all costs’, we know that we learn best from failures, well as long as they don’t kill us or put our company out of Business in the meantime.

Thats what this chapter is all about, we know failure will happen at some stage and it’s about being prepared to fail, but fail fast, embrace failure and ensure learning are taken from it.

Some of the key points I have taken from this chapter are:

  • Failing Fast in tech
    • Fail Safely – the opposite is fail-unsafe which is not good, it could also read fail-bad or even fail-fired!
    • Contain the failure – Expect failure and handle it accordingly, be ready for it.
    • Accept Human Error – and keep it blameless!
  • Failing Well
    • Ensure growth mindset, not a fixed mindset, expect failure, learn from it, grow!
    • Create the freedom to fail – ensure failure is handles appropriately and expectedly
    • Encourage experimentation
    • Balance challenging work with fulfilling achievements
    • Reward smart risk taking
    • Build a soft landing, make sure less experience engineers are supported and encouraged
    • Perfect the art of done! – Avoid analysis paralysis, get the first draft done, do something and iterate, designing for perfection wont happen, get something done and work from there!

Chapter 17 – Preparing for Incidents

This chapter discusses the preparation a DevOps mindset should encourage in incident handling. By minimising as much human error as possible, improving on call response, managing incidents as they happen and measurements to enable continued success.

  • Combat human error with automation
    • Noops is a niche in this area but never the end state for every scenario
    • Cognitive ergonomics
    • Organisational ergonomics
    • Focus on systems and automate realistically
    • Use automation tools to avoid code integration problems
    • Handle deployments and infrastructure appropriately and in a standard way
    • Limit overengineering
  • Humanise on call rotation
    • When on call duties become in humane bad things happen!
    • Humane on call expectations – Document code better, create incident runbooks, empower individuals to ask questions and to take risks
  • Managing incidents
    • Make consistency a goal
    • Adopt standardised processes
    • Establish a realistic budget
    • Make it easy to respond to incidents
    • Responding to an unplanned disruption; Assess, Triage, ensure engineering is available, resolve, review
  • Empirically Measuring Progress
    • Understand what you are measuring and why, sometime metrics can be deceiving and meaningless
    • Mean time to Repair (MTTR)
    • Mean time between failures (MTBF)
    • Cost Per Incident (CPI) – Can rack up quickly! Can act as a measure to increase efforts to incident response.

Chapter 18 – Conducting Post Incident Reviews

  • Going Beyond Root Cause Analysis
  • Stepping through an incident
    • Discovery
    • Response
    • Restoration
    • Reflection
    • Preparation
  • Succeeding at post incident reviews
    • Schedule it immediately
    • Include Everyone – can be vital learning opportunity for wider teams
    • Keep it blameless
    • Review the timeline
    • Ask the tough questions!
    • Acknowledge hindsight bias
    • Take notes
    • Make a plan

Again, some great reading material and I hope my notes above have helped anyone reading this. I think if you look back at my last few days of understanding the DevOps process the notes do hold true. I would absolutely recommend reading the detail of the book, or at least having a search for what some on the terms may reference. Or even reach out to me, more than happy to have a chat with anyone looking to understand the same things as me!

Moving onto the next sections, Section 5 – Tooling your DevOps Practice and Section 6 – The Part of Tens make up the last 60 or so pages of the book. I will see how I get on. I may try and look to complete these two sections together.

Other Thoughts

AZ-104 Prereqs for Azure Admins

In amongst the great learning I’m undertaking around DevOps, I am also loving the new modules on Microsoft Learn and have been dipping in and out of those over the last few days, I noticed that there was an AZ-104 Admin Prereqs that I hadn’t seen some of the content before, so I have taken it upon myself to just brush up on the differences since AZ-103 and worked through a few of the modules.

You can find theme here:

I’m just glancing over at my stack of new books and am being seduced by my ‘Docker Deep Dive’ and my ‘Fluent Python’ books and really keen to get started with them! I’m not sure Ill do quite a comprehensive analysis of the study point as I have with the DevOps path.

I’m very aware that I am still on the initial 10 days of this 100DaysOfCloud Challenge and its about really finding my feet and setting out my stall.

I would love to hear other peoples views on where I am heading. Have you been there? Should I watch out for anything?

Thanks for reading all


100DaysOfCloud Overview

My Main ReadMe Page is all set up with a bit about me!

The guys at 100DaysofCloud have set up the GitHub repo to be cloned and also have a great repo containing ideas and areas to collaborate on:

My Github Journey tracker can be found here:

Please Watch/Star my repo and feel free to comment of contribute to anything I push! I really look forward to hearing from anyone who is going to jump on the journey around the same time as me! Lets see where I get to in 100 days!

I would encourage others to jump on this journey, I’m not sure that I will be able to commit every day for 100 days, but as long as I can complete 100 days that will be great!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s