Category Archives: Job

Is lean ITSM a myth?

This posting is ~7 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

When I talk with companies about IT processes and IT service management (ITSM), ITIL seems to be the de facto standard for ITSM. Implementing an ITSM without using ITIL, seems to be impossible. I have many customers that have implemented ITIL-based ITSM processes and most of them had enormous trouble during the implementation and/ or operation.

Lean ITSM and ITIL?

Companies mainly have three problems during the implementation and/ or operation of ITIL processes:

  • slow processes
  • complex processes
  • error prone processes.

ITIL doesn’t show you how to design a process. ITIL is a collection of best practices. Usually, you have someone that helps you to design and implement the processes and functions. If you don’t have an experienced consultant, you might get processes, that lead you to the wrong direction: Big, fat, complex, ugly, error prone processes.

You don’t have to implement all processes. It’s sufficient to implement some of the processes of the ITIL Service Operation phase to slow down you business. I used the term “agility” in one of my last blog posts (Industrialize your IT – after you have done your homework) to describe the ability of an organization to act flexible, proactive, adaptable and with initiative in times of change and uncertainty.

Agility needs lean processes

If your business has to be agile, your IT has to be agile. Agility needs lean processes. What are the characteristics of a lean process?

  • It delivers value

For sure, every process should deliver value. The value should, however, be determined by the customer. Only if the customer would pay for it, it’s has a true value.

  • It respects and involves the people who run the process

I don’t know how often I saw teams, that has to communicate over a ticket system, just because “it’s the process”. That’s not what I would call a process that respects and involves people. It’s a waste of time and knowhow.

  • It’s steamlined and free of waste

The Toyota Production System (TPS) knows three types of waste:

  • Mura (waste due to variation)
  • Muri (waste due to overburdening)
  • Muda (transportation, waiting, overproduction, defects, inventory, movement, extra processing)

To get a streamlined and waste-free process, you have to examine your current processes for potential waste. The Lean Management/ TPS knows different methods and instruments to streamline processes and to avoid waste.

Lean IT or ITIL? Or lean ITIL?

Some companies think that Kanban is all you need for Lean IT. No, that’s not all you need. Kanban is only an instrument to implement the pull principle. Lean IT is much more. But you don’t have to throw away your ITIL knowhow.

It can be useful to review your current ITIL processes for potential waste. Many Lean Management methods and instruments can be used in ITIL processes and functions. With a little skill, you can streamline your processes and get much leaner ITIL processes.

This is nothing new. ITIL and Six Sigma are often used together. In this case, Six Sigma is used to optimize the quality and the output of ITIL processes. So why not use Lean Management methods and instruments to put ITIL processes on diet?

Industrialize your IT – after you have done your homework

This posting is ~7 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

Today a tweet from Keith Townsend (@CTOAdvisor) has caught my attention:

Keith wrote a nice blog post and I really recommend to read it. His point is, that automation enables business agility.

The point of automation is to enable business agility. Business agility isn’t achieved by automating inefficient processes. The start of an IT automation project begins by examining existing processes and eliminating inefficiency.

Agility is the ability of an organization to act flexible, proactive, adaptable and with initiative in times of change and uncertainty. Can IT automation enable business agility? No, I don’t think that IT automation can enable business agility.

It’s all about processes

Business processes are the key to business agility. Keith wrote, that

Business processes are very organic systems that grow and evolve as a company’s business and culture matures.

and I’m totally fine with this statement. Processes tend to optimize themselves, if you let them. In cases where consultants are hired to improve faulty processes, the processes are often not the problem. The people who run them are often the problem.

Before you should think about IT automation, you should optimize your processes. Keith has a similar view:

Automation comes after you’ve established a repeatable process that has few if any points of contention.

Lean processes are the key

A business process has to deliver value. That’s quite clear. But a process has to deliver value and it has to be lean. Otherwise the business would lose its agility.

What are the characteristics of a lean process?

  • It delivers value
  • It respects and involves the people who run the process
  • It’s steamlined and free of waste

Streamlining processes and avoiding waste are the anchors for IT automation! But you should only automate what you have fully understood. Keith mentioned a nice example:

Money and time are better spent implementing an IP management system and continuing the manual processes. The problem wasn’t self-provisioning of VM’s but waiting for IP address assignment.

This example clearly shows that it’s mandatory to check each and every process for potential waste and pain points. Only after this step, you will be able to design a new, lean process.

IT automation can’t enable business agility. But lean business processes can do. IT automation can help you to streamline your business processes.

Time is changing

At the end I would like to refer to an important statement in Keiths blog post:

All of this goes back to my drumbeat that IT infrastructure practitioners acquire business alongside their technical skill set.

100% agree! The time of the people who have been sitting in dark basements, with their “There are 10 types of people” shirts and the opinion, that IT is the navel of the world, comes to an end.

Solving problems: A structured approach

This posting is ~7 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

What is a problem? A problem is an obstacle, that has to be surmounted. Solving a problem is connected with obstacles. Or more general: Problem solving is a process to get from an unsatisfactory to a satisfactory situation.

Most of us get paid for solving problems. It’s irrelevant if you are paid for solving technical problem (e.g. My computer doesn’t work), or if you are paid to create solutions for customers (e.g. design infrastructure for a Citrix XenApp farm). At the end you solve a problem.

Every problem has characteristics, that can be used to describe it.

  • Solubility

Not every problem is solvable. Think about “Squaring the circle“. But often a problem seems to be unsolvable because it’s not well defined. If Initial situation, obstacle and target situation are not clearly formulated, you won’t be able to solve the problem.

  • Decomposability

If you can decompose a problem into multiple subproblems, it is a hierarchical problem. Otherwise, it’s an elemental problem.

  • Effort

The effort to solve a problem is always different.. A problem is theoretically solvable, but it may require such a high effort, that it is practically unsolvable.

  • Subjectivity

Even if a problem is well defined, it appears different in regard to complexity for different people.

How to start?

First of all

  • Understand and define the problem

This is most important part. Before you try to solve a problem, make sure that you have really understood the problem. Then you should define the problem. Only a clearly defined problem can be solved. And it’s much easier to solve a clearly defined, than a vague problem. If it’s a complex problem, then you should try to

  • Simplify or decompose the problem

A simplified problem can help you stay focused. If you can’t simplify a problem, you can try to divide it into subproblems. With a clearly defined, simplified/ structured problem, you can start to

  • Find the root cause

Collecting information is the key. Collect information about what happened before, during, and after the problem has occurred. Identifying the root cause for a problem can be a time consuming task. But let me say this clearly: Information is the key. Information that help to find the root cause are not only observations (e.g. logs, error messages etc.). You can can use the results of systematic tests. Collect as much data as you can.

Sometimes it can be useful to create a hypothesis.

Scientists generally base scientific hypotheses on previous observations that cannot satisfactorily be explained with the available scientific theories.

If you see that System A is affected, but system B should be affected too, but it’s not, it might be time to create a hypothesis. With a hypothesis in mind, you can try to prove it. Test the hypothesis by performing tests and collecting data. This strategy is called “hypothesis testing”.

At some point, you should have identified the root cause. With the now known root cause, you can

  • Create solutions and select the best one

Sometimes it’s easy. But sometimes it’ not that easy. A trade-off analysis can help to identify the best of multiple solutions.

  • Create an action plan

Even if you only have to disable a specific feature, it’s a good idea to formulate an action plan. Even if consists only of three lines… You should state clearly

  • WHAT you do
  • WHY do you have to do it, and
  • HOW to you plan to check it

With these steps, you should be well prepared. It doesn’t matter what kind of problem you are trying to solve: The process is basically the same.

Other problem solving methods

Over the years many problem solving methods have been developed. Kepner-Tregoe is one of them. Other well known methods are:

  • A3 Problem Solving
  • PDCA
  • Eight Disciplines (8D) Problem Solving
  • Failure mode and effects analysis (FMEA)

A3 Problem Solving has been developed at Toyota for their Toyota Production System (TPS). It’s an often used method in Lean Manufacturing. A3 helps to solve problems by pretending a structure (WHAT IS and WHAT IS NOT the problem, describe the problem, root cause, solution etc). This strucure is placed on an A3 sheet paper (that why it’s called A3). The process is based on the principles of Deming’s PDCA cycle.

PDCA, or Plan-Do-Check-Act (sometimes Shewhart-Cycle) was made popular by Dr. Edwards Deming. Plan-Do-Check-Act refers to the four phases of this cycle.

  • Plan: Plan the change
  • Do: implement the change
  • Check: Check the sucess of the implemented change
  • Act: Take action based on the results of “Check”

Eight Disciplines (8D) Problem Solving was developed by the Ford Motor Company. The D0 phase is the starting point for the D8 process, but it’s not counted.

  • D0:  Plan for solving the problem and determine the prerequisites
  • D1: Establish a team of people with the required skills and knowledge
  • D2: Describe the problem
  • D3: Define and implement containment actions
  • D4: Determine and verify the root causes
  • D5: Plan permanent corrective actions for the observed problem
  • D6: Implement the best permanent corrective actions
  • D7: Modify management systems to prevent a recurrence
  • D8: Congratulate your team!

The Failure mode and effects analysis (FMEA) is a highly structured, systematic approach for failure analysis. There are different FMEA alalyses:

  • Functional
  • Design
  • Process

FMEA is based on inductive reasoning (forward logic). FMEA is based on a highly structured process, which can be represented as followed.

  • Structural analysis: A system is divided into its components
  • Functional analysis: Identify the function of each component
  • Failure analysis: Identify the possible failures for each component
  • Calculate the risk: Risk Priority Number = occurrence ranking x detection ranking x highest severity ranking
  • Optimize: Optimize the component to mitigate the risk

No matter what, stay organized

The key to successfully solve problems is to stay organized. Solving problems isn’t magic. It is a very structured process that gets better with increasing experience. Try to create your own, structured method. Or use one of the mentioned problem solving methods. But in general:

  • Always try to describe a problem
  • Try to simplify or break it into smaller problems
  • Search and verify for the root cause
  • Develop a solution

Kanboard – Kanban made simple

This posting is ~7 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

I really like the idea behind Kanban. I wrote about it in 2014 (Organize your work with Kanban), and I even wrote my bachelor thesis about it (Industrialisierung der ITIL Service Operation Phase unter Verwendung von Lean Management // Industrialization of the ITIL Service Operation phase with Lean Management).

The word “kanban” comes from the japanese and can be translated with “signboard”, “card” or “billboard”. Kanban is a scheduling system and helps to implement the pull principle in a lean manufacturing system. The methods and instruments of lean management are widely used, not only in the industrial manufacturing. Especially in the the agile software development, Kanban has reached a noteworthy distribution.

What is Kanban?

Kanban is used in the Toyota Production System (TPS), which was developed by  Sakichi Toyoda, Kiichiro Toyoda and Taiichi Ohno. Taiichi Ohno is often stated as the “father” of TPS. TPS was developed to achieve a just-in-time production and to avoid overburden (muri), inconsistency (mura) and to eliminate waste (muda).

Taiichi Ohno defined seven types of waste:

1. Delay, waiting or time spent in a queue with no value being added
2. Producing more than you need
3. Over processing or undertaking non-value added activity
4. Transportation
5. Unnecessary movement or motion
6. Inventory
7. Reduction of Defects

To this end, TPS is built on two main pillars.

  • Just-in-time (JIT) – Making only what is needed, only when it is needed, and only in the amount that is needed
  • Jidoka – Automation with a human touch

TPS know different methods to avoid the seven types of waste. One of these methods is the pull principle. Kanban is an instrument to implement the pull principle in an industrial production process.

Lean Manufacturing is often used as a synonym for TPS. TPS and Lean Manufacturing are similar, but not the same. Lean Management is a more generic approach. Lean Manufacturing has been defined by Womack / Jones / Roos after they have studied the japanese automotive industry.

How can I use Kanban to improve my work?

In 2014, I wrote a blog post on how to organize your work with Kanban (Organize your work with Kanban). Kanban is used not only in the industrial manufacturing, but also in areas like software development or agile project management.

Kanboard – Simple and open source visual task board

Kanboard is not for everybody, it’s made for people who want to manage their projects efficiently and simply.

I’m stumbled over Kanboard some weeks ago. Kanboard is an open-source project management software, based on the Kanban methodology. It’s developed by Frédéric Guillot.

Kanboard is simple – easy installation, no fancy GUI. Focused on simplicity and minimalism.

It uses the visualization and the Kanban methodology to give you an easy overview of projects and tasks. You can use subtasks, attachments and comments to breakdown complex tasks. You can use the markdown syntax to format comments.  A central dashboard gives you centralised view over all projects, the number of tasks etc. Kanboard also offers Gantt charts to visualize the timeline of your projects.


Patrick Terlisten/ Creative Commons CC0

Most projects are done in teams, so it should be no surprise that you can work in teams with Kanban. Even international teams are no problem, because Kanboard is available in 26 languages. You can create local uses or you can use external authentication sources, like a LDAP directory (Microsoft Active Directory). In addition, you can use Google, Github or Gitlab as authentication source. If you wish to use something else, Kanboard offers a custom authentication system that is using a flexible authentication reverse proxy. You can use customizable user roles to implement different project roles and some kind of role-based access control. If you need a bit more security, you can implement 2-factor authentication.

Kanboard offers some nice integrations/ plugins. You can use APIs or web hooks to interact with other systems, you can subscribe to calenders using RSS, or you can use SMTP to create new tasks in Kanboard. Furthermore, Kanban is able to notify you using Hipchat, Slack, RocketChat, Mattemrost and Jabber.

Automation is mandatory. You can use automation to change nearly everything automatically, for example the assignee of a task, the color of a card, categories etc.

Controlling is also mandatory. Because of this, Kanboard offers a nice time tracking feature to keep track time for tasks and subtasks. The embedded analytics and reports helps you to analyze and improve your work. They offer simple flow diagrams or the burn down charts.

How to install Kanboard

The installation is really easy. All you need is a web server (Linux, Windows, FreeBSD etc.) with PHP support. For small deployments, you don’t need a database. In this case, Kanboard will use sqlite. Download the software, upload it to your webserver and extract it. Make sure that the “data” folder is writeable for the user, that is running the web server. Open a browser and enter the URL (depends on your deployment, whether it is a subfolder or whether you are using virtual servers). The first login is username “admin” and password “admin” (don’t forget to change it!). That’s it.

Make sure that you read the documentation, especially if you want to use integrations / plugins. In addition, you should read the documentation to familiarize yourself with the functions of Kanboard.


Kanboard is a simple and lightweight way, to use the Kanban methodology for agile project management. I like the simple installation and the really lightweight and responsive user interface. I really recommend to give it a try, not only if you are searching for a agile project management tool. You can treat everything as a project.

Complexity knows only one direction: Getting more complex

This posting is ~7 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

Complexity, in general usage, tends to be used to characterize something with many parts in intricate arrangement.


Following this disambiguation, and assuming that “many” means N > 2,  all systems with at least two or more components are complex. But that would be an exaggeration, right?

Why is information technology complex?

Most systems in information technology (IT) are complex. Almost everything we are working with, consists of two or more components, regardless if it is hardware or software. But it’s a question of the perspective. If you look at a system from a higher level, you will only be able to identify some of the greater components. If you look closer at it, you will be able to identify more, smaller components. Every system consists of hardware and software. Hardware is nothing without software. Think of a storage system, with all those disks, controllers, disk enclosures, firmware etc. Or think bigger: A complete infrastructure based on a VMware vSphere cluster with multiple servers, network switches, SAN switches, storage systems, synchronous mirroring between data centers etc. The system can be split into its components and sub-components. And each component and sub-component is more or less important of the operational function of the whole system. Adding more features makes a system even more complex. With each added feature or modified feature, the probability rises, that something breaks or doesn’t work as expected.

IT systems and infrastructure tends to act like a nonlinear system.

Matter of opinion

In fact, there is no uniqe definition of complexity. What complexity means, is in the eye of the beholder. If you understand each component of a complex system very well, the whole system isn’t complex for you. But the same system would be a “complexity hell” for someone else.

I’ve talked to many customers in the last months. Most of them don’t try to understand each aspect of their infrastructure. All they want, and all they need, to know is how to keep it running. And most of them have accepted that they need external help (like me), even for “simple tasks” e.g. adding a vSphere ESXi host, create new LUNs, adding VLANs, recovering a database etc. This also includes planned site failovers, even if the failover is done with a few clicks. It’s the order of the necessary tasks and the knowledge about dependencies, that prevents customers to do this on their own.

Interestingly, this is not only a problem of smaller companies, or smaller IT teams. I also observed this in bigger companies and IT teams. The only difference is, that you have more specialized staff in bigger organisations. One-track specialists. Knowledge is distributed over more individuals. Each and everything has to be discussed in bigger teams, meetings etc, because each individual knows only a small part of a bigger picture.

A typical reaction is to source less known, or “complex”, systems out. Out of sight, out of mind. I hear you say “Migrate to public cloud / IaaS / PaaS etc”. But does this make it more simple? No. Complexity is only shifted. Automation can reduce complexity! No, automation can hide complexity. You should only automate, what you have fully understood. A nice GUI can reduce complexity! No, a nice GUI can hide complexity. So there is no way out?


One possible way out of the “complexity hell” is to try to understand most (not all… that’s nearly impossible) of the components, and how they are interacting with each other. This seems to be the best way, right? No, it’s not the best way. To achieve this, you would need to invest much more time in building up knowledge. Sure, that might be a way for someone who has time, or someone who is being paid for his knowledge. But this seems to be not the best way for most customers.

Another possible way out of the “complexity hell” is to try to reduce components. Keep it simple. Focus on a lean design. Focus on the problem. Focus on the requirements. Lean thinking and a A scientific approach during the solution design, can help to build less complex systems. A customer doesn’t need a synchronous mirrored storage or replication, only because he has two datacenters. Sometimes, distributing primary storage and backup into different datacentes is sufficient. Stop using iSCSI or Fibre Channel for three or four node VMware or Hyper-V clusters. Focus on SAS for storage interconnect. Skip 24×7 support contacts if the customer only works from 9 to 5. Design the backup concept from the recovery perspective.

You get no prize for the prettiest solution. IT isn’t a beauty contest. Build simple and robust solutions. Complexity can’t be reduced with specific products. It’s a question of the design.

Selected as PernixPro

This posting is ~8 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

Yesterday, at 02:13am (CET), I got an awesome e-mail:

Dear Patrick,

I am pleased to welcome you to the PernixPro program!

I’m very happy to be part of this program!

PernixData | PernixPro

This program is similar to the VMware vExpert or Microsoft MVP program. It’s designed to spread the magic of PernixData FVP. I am totally convinced of PernixData FVP. Because of this, I’m very pleased to be part of the program. Thank you for the recognition!

If you want to know more about the Pernix Pro program, make sure that you take a look at the corresponding website.


Organize your work with Kanban

This posting is ~9 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

Everyone has their own technique to organize work. As you maybe know, I’m a big fan of Lean. And you know maybe also, that Lean is a philosophy based on the aspect to create value for customers and eliminate waste of resources in production processes. Taiichi Ōno, the father of the Toyota productionsystem, defined seven forms of waste. Womack and Jones developed Lean Production, which is based on TPS, and highlighted five principles to achieve a lean production.

  • Value
  • Value stream
  • Flow
  • Pull
  • Perfection

There is especially one principle, which is used to schedule work: The pull principle, and Kanban is a method to realize this. Ōno stated, that Kanban has to follow strict rules and two rules are very important: Downstream work stages relate on the work of upstream work stages. The amount of the requested demand is indicated by a signal card. The upstream work stages produce only in the quantity demanded, and only if the demand has been requested by downstream work stages. Downstream work stages “pull” work from upstream work stages and the demand is delivered just-in-time. This sounds reasonable, if it’s a production process. But how can this help me to organize my work?

Use Kanban to organize work

Kanban literally means “signboard” or “billboard”. The board is used to visualize the work flow. The board is divided into sections, to which Kanban cards will be attached. A Kanban card signals that something must be done and it can be moved through the different stages. This is a simple three sections board for visualizing to-dos.



Now you can add Kanban cards to the board.



When you switch from “To Do” to “Doing”, simply move the Kanban card of the Task to the “Doing” section. If you finished the task, move the Kanban card to “Done”.



To create a flow and minimize task switches it’s important to add a Work-in-Progess limit. This is indicated by the [2] behind “Doing”. This means that at maximum two tasks can be in the “Doing” section. Why is a WiP limit important? The WiP limit limits the number of tasks in a section. New tasks are pulled, when there is free capacity, e.g. when another task is finished. This limits the number of switches between different tasks. You can focus on a limited number of tasks and in the end, you will be able to increase the throughput, and due the limited task switches, the quality will also increase.



Because there are two Kanban cards in my “Doing” section, the section is colored. If I move a third Kanban card to it, the section would be coloured red. As you can see: You have to follow the rules so that Kanban works. Feel free to create multiple sections. I use Kanban for my personal to-dos and for my work. The visualization helps to get a quick overview. I also use mindmaps, because it’s also a good instrument to visualize complex things.


I use Trello for my personal Kanban. You can use it with a web browser, on Windows 8, iPad, iPhone, Android and Kindle Fire. I use the Chrome app along with the Kanban WiP for Trello extension. I also have the app on my iPhone and iPad. I really like Trello. It’s lightweight and customizable. You can add dates, labels and attachments to Kanban cards. You can also add comments, check lists and a description to it. I really recommend to simply try it! Kanban is not as hard as it seems. Use it. Strive for perfection.

Problem analysis with Kepner-Tregoe

This posting is ~9 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

When you deal with problems in IT, you often deal with problems where is root cause is unknown. To solve such problems, you have to use a systematic method. Only a systematic method leads to a fast, effective and efficient solution. One of the most commonly observed methods in my career bases on approximation. We all know it as “trial and error”. Someone tries as long until the problem is solved. Often this method makes it worse than it was before, and it often leads to wrong conclusions, and furthermore wrong results. If someone draws a wrong connections at the beginning of the analysis, this leads to a totally wrong path. I would like to illustrate this with an example:

John Doe tried to monitor VMware ESXi hosts with a HP Systems Insight Manager (SIM). The VMware ESXi were running on different HP ProLiant models. John noticed, that some of the ESXi hosts showed more information than other hosts. After a very quick Google search he quickly concluded, that this was related to iLO 4 Agentless Monitoring, because those hosts, that showed all information, were ProLiant Gen8 models.

As you can imagine, this was dozens of miles away. The solution was simple: The Gen8 models were installed with ESXi images from HP, which includes the necessary agents. This example shows another very ugly behavior: Googling around, in the hope to find a problem description that sounds similar. This is often done by entering a error message into Google, selecting a search result and trying the proposed solution. And quite often the article is not even read, simply scrolled down to the solution. It’s unlikely that the same error message can have different causes, which need different solutions.

What could be a systematic method to solve problems? I’d like to introduce to Kepner-Tregoe (KT). KT stands for two things: A consulting company founded by Charles Kepner and Benjamin Tregoe, and for a method. KT is mentioned by ITIL as a component of the Problem Management in the Service Operation phase. You can use KT for problem solving, decision making or potential problem analysis. I will focus on the situation analysis and problem analysis. The situation analysis is common for problem solving, decision making or potential problem analysis.

The Kepner-Tregoe method

The KT method is based on a rational process and it’s divided into four different processes:

  • situation analysis
  • problem analysis
  • decision analysis
  • potential problem analysis

Behind each process is a question you should ask.

The situation analysis

During the situation analysis the question is “What’s going on?”. At this point, the problem analysis hasn’t started. Before you can analyse the problem, you have to clarify the situation, outline concerns and set priorities. Ask yourself about the current and future impact, how much time do you have to find a solution, and at which point a solution could be impossible (limitations because of time, budget etc.).

The problem analysis

 The problem analysis consists of five consecutive steps:

  1. Define the problem
  2. Describe the problem
  3. Create hypotheses about the cause
  4. Test the hypotheses
  5. Verify the root cause

Use the 5 Ws to define the problem. Only a problem description, that includes the 5 Ws is capable to fully describe a problem. Such a description will help you, and your colleagues, to understand the problem.

  • Who is affected by the problem?
  • Why is this important to solve the problem?
  • What are the symptoms?
  • When does the problem occur?
  • Where does the problem occur?

If you created you problem description with the 5 Ws, you can concretize the answers with “IS” and “COULD BE but IS NOT” aspects. Let’s pick up the example from above:

Who is affected by the problem? HP ProLiant G6 and G7 models running VMware ESXi.

A HP ProLiant G7 model with VMware ESXi image “IS” affected. A HP ProLiant G7 and Gen8 model with a HP custom Image for ESXi “COULD BE but IS NOT” affected.

As you can see, this will dramatically reduce the number of possible causes, especially when you add the problem description and the symptoms. But this also shows another fact: You have to take a detailed look at the affected components/ systems, and you have to take care, that you not miss any deviations between the components/ systems (in the example all hosts were running ESXi 5.1, but some of the hosts were running a VMware image, some hosts a HP custom ESXi image). You also should identify what changes are made in the past. This may be answered by the “When?” question (When does the problem occur? After demoting one of the four Active Directory Domain Controllers).

Now it’s time to create hypotheses about the possible cause. Depending on the problem description, the past changes and the “IS” and “COULD BE but IS NOT” aspects of the problem, it should be possible to create one or more hypotheses.

With one or more hypotheses, you have to test each of them against the “IS” and “COULD BE but IS NOT” aspects. The question is: Can the hypothesis explain the “IS” and “COULD BE but IS NOT” aspects? One of the hypotheses will best explain the “IS” and “COULD BE but IS NOT” aspects. This is the most probable hypothesis.

Verifying the root cause is the last and trickiest part. You have to verify your assumptions and reflect the way, how you have come to the decision what the root cause is. If you are sure that you have identified the root cause, you can develop and implement a solution. After the implementation, you have to verify the result. Is the problem solved? Yes? Fine! If not, you have to involve this into the test of the other hypotheses.


Kepner-Tregoe is a totally rational method. It’s hard at the beginning not to make quick assumptions and to reflect. It’s something you have to train. I guarantee that you will get better with each problem you solve. KT problem analysis was used during the Apollo 13 mission. And what should I say? It worked! So give it a try.

EDIT: Kepner-Tregoe informed me over Twitter, that there are two groups on LinkedIn, where you can get more information and talk to other KT practitioners.

IT industrialization: From manufactory to factory

This posting is ~9 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

If you want to see highly automated and efficient production processes, you have to leave the information technology (IT). Look at the the automotive industry, or generally a industry with a high amount on industrial production. What is the output of IT? It is an IT service. What is a service? A for the customer convenient usable combination of knowledge, technology and processes. The following is the definition for a IT service used in the IT Infrastructure Library (ITIL) v3:

A Service provided to one or more customers, by an IT service provider. An IT Service is based on the use of information technology and supports the customer’s business process. An IT Service is made up from a combination of people, processes and technology and should be defined in a service level agreement.

You can use this definition and transfer it onto the automotive industry. You will notice, that a service-orientated IT and a automotive factory doesn’t differ much if you abstracted it.

To transform your IT into a service-oriented IT, you have to adopt IT service management (ITSM). You have to implement processes that transform people, knowledge and technology to a customer consumable service. The server you setup for a new application is only means to an end. Your customer, regardless if it’s an internal or external customer, doesn’t care if the server is from IBM or Hewlett-Packard. He doesn’t care if you are running VMware ESXi or Microsoft Hyper-V. He doesn’t care if you are running Apache or Microsoft IIS. He needs an additional web server, because the other webservers in the cluster are under high load, the website is to slow and potential customers are not willing to wait and move on to the competitors website.

Please note that when I talk about customers, I mean internal and external customers. IT always serves a customers, regardless if it’s an internal or externel customer.

The Manufactory

I know companies that act more like a manufactory than a service-oriented IT. Each server or VM will be installed by hand. Additional terminal server will be installed by hand. To avoid differences between servers, the administrators have to follow a step-by-step instruction (347 pages, every step documented by screenshots). And, even if there is a step-by-step instruction, each server will be a bit different. Virtualization? Uhhh, witchcraft. And if it is used, then resources will be reserved and the configuration will be more static than elastic. Don’t trust the hypervisor! If you ask the customer “Why don’t you automate?” you will get an answer like “We don’t have time and/ or knowledge to do this.” or “We’re to small.”. I saw several companies that used a very simple monitoring: The phone. If it rings and the customer complains about an error, something has broken. These companies don’t think in services, they think in prodcuts. Business requirements often end up in the statement like “We can not fulfill this requirement today. We need x months and x to fulfill these requirements.” Each solution designed to fulfill a business requirement is something between a masterpiece and a quick ‘n dirty solution. The wheel is often invented several times. Mostly the IT in these companies sees oneself as the center of the universe. And trust me: This is nothing that I only saw in small companies…

The service-oriented IT

To do ITSM is a good thing. Most companies I know that do ITSM, use ITIL for it. When I talk with other IT pros about ITIL, I get as a rule two reactions:

  • hatred
  • frustration

Exceptions prove the rule... ITIL is a common practice and doesn’t describe how a process has to look like. Someone told them how the process has to look like, and they adopted it. I know a lot companies that adopted ITIL processes without, or with only a few customizations. Service-oriented IT divisions have understood the need of processes and of a strategic approach to manage IT. They have understood that they deal with customers and that the customer pays the bill. This usually leads to the fact that they are trying to reduce costs and improve quality. So it’s not unusual that they are trying to automate most of their enviroment or that they use economies of scale. Sharing infrastructure between customers leads to lower fixed costs. They use tools to proactively monitor their environments. You can’t control what you can’t measure. And measurement is essential for dealing with service level agreements and to improve quality. But there is a dark side… The personal worst example was a company, in which a server deployment needed between three and six weeks. From the arrival of the hardware until the handing over to the operating. The effectiveness of the change approval and authorization was so bad, that sometimes business requirements were obsolete before the change was approved. This bad example shows the main problem of ITIL/ ITSM: The processes. ITIL/ ITSM isn’t bad, but sometimes it’s bad implemented. It’s mainly the excessive bureaucracy that deters many IT staff.

The factory

Let us pick up again the automotive example. Suppose that the transmission supplier changes. If you are the boss of the automotive company, what would you tell your production manager if he tells you “Sorry, but the transmission supplier has changed and we have to stop the production for four weeks. We need to search for a new supplier for the clutch and we have to modify some parts of the engine.” You would kick him out of your office, right? Me too… But that’s reality in IT. Changes in business requirements often causes changes in IT. Yesterday I found this interesting statement:

Today’s businesses cannot afford to deal with traditionally disruptive tasks, such as rolling or forklift upgrades. Systems and workflows must be always online and available.

I found it in the article “Completely Dismissing the FUD against Nutanix!” on Andre Leibovici blog. Nutanix is not the topic in this article, but Andres statement confirms my “empirical experience”. The next step for a service-oriented IT is to streamline processes and to modularize the service portfolio. This is what sometimes is called “IT Industrialization”. It describes the adoption of methods and instruments of the industrial production into IT. There are five essential characteristics about IT Industrialization:

  1. Streamline your processes. If you use ITIL for ITSM, than analyse your processes and try to reduce the waste. If you use another ITSM method, do the same. Reduce waste of resources and increase the quality. Strive for perfection. Take a look at Lean Management, TQM, Six Sigma. Streamlining processes will help you to react faster to changing business requirements.
  2. Products not projects. You have to focus more on products than on project. An IT service is a product that leaves an IT factory. Focus on that. If a customer wants a webserver, he wants a webserver. Not a project to installa a webserver.
  3. Standardization and automation. Don’t invent the wheel again and again. Try to achieve the highest possible standardization grade. Don’t ask your customer how many memory or vCPUs he want. Let the customer choose if the needs a big, mid or small VM. The number of manual operations should be as low as possible. This leads to automation.
  4. Use a standardized sales channel, e.g. self-provisioning.
  5. Reduce the production depth. Make or buy. If someone produces something with a better price/ quality ratio, don’t hesitate to buy his services.

Maybe you already realized that some of my enumerations sound like “cloud”. The National Institute of Standards and Technology (NIST) lists five essential characteristics of cloud computing:

  1. On demand self-service: A consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with each service provider
  2. Broad network access: Capabilities are available over the network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms.
  3. Resource pooling: The provider’s computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand.
  4. Rapid elasticity: Capabilities can be elastically provisioned and released, in some cases automatically, to scale rapidly outward and inward commensurate with demand
  5. Measured service: Cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service.

Where is the connection?

On your way, from the manufactory to the factory, elements of the cloud computing can help you to industrialize your IT. IT Industrialization is not a question of scale. Even if you are a small IT division, you can profit from IT Industrialization, but on a smaller scale. You don’t need the VMware vCloud Automation Center to setup self-service provisioning of VMs. You can use the VMware vCenter Orchestrator, Microsoft Systems Center or Puppet. Think about the five characteristics of IT Industrialization and try to adopt them on your needed level. Streamline processes and to modularize your service portfolio. Standardize and automate where you can. Try to reduce the waste of resources. And never forget who’s paying the bill…

Feel free to leave a comment or pick up this article and express your own view to IT Industrialization.

vExpert 2014 benefits

This posting is ~9 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

In addition to the benefits that VMware grants to vExperts, a couple of vendors grant also benefits to vExperts. This includes free licenses, subscriptions or other offers. This is only a loose compilation of vExperts benefits.

SolarwindsVirtualization Manager NFR
VeeamBackup & Replication NFR
PluralsightAnnual Plus Subscription
TintriPolo Shirt
DevolutionsRemote Desktop Manager 1y NFR
Login VSIVIP program
Hewlett-PackardStoreVirtual VSA NFR
Darren WoollardSticker & URL Shortener
DataCoreSANsymphony-V NFR
Proximal DataAutoCache NFR
VSS LabsvCert Manager NFR
UnitrendsEnterprise Backup for VMware or Hyper-V NFR
SymantecBackup Exec V-Ray Edition NFR
Royal TSRoyal TS/X NFR

I also recommend to check the following blog post:

I will add further offers to this list. Feel free to leave a comment and to point me to similar blog posts.