How To Measure IT Service Quality
IT Infrastructure Library (ITIL®) tenets are that Business needs drive IT operations; and that User perception is the true measure of IT Service Quality. However, most IT organizations report on technical metrics like jitter and loss, which while necessary, do not show how IT Services meet User needs, and offer little evidence of IT Service Quality.

Measuring User perception forms an intuitive bridge for effective communications; reporting on technical metrics is the root of Business/IT communications failures. So, how do you bridge this Business/IT communications gap? It turns out that just four service attributes shape User perception: Availability, Capacity, Security, and Data Integrity.

Taken together as degradation they measure IT Service quality. The ITIL also defines an IT Service Quality metric for this degradation -- Impacted User Minutes (IUM).

Following is discussion on how to use IUM to truly measure IT Service Quality, bridge the Business/IT communications gap and value IT Services at the same time.

Business Perspective of IT Service Quality

Business does not use technology for the sake of technology; rather business leverages technology to empower Users to accomplish their job or mission. Users rely upon IT Services to accomplish some goal, task, function or mission. It makes sense then that examining a User’s ability to accomplish their job or mission is the most pure and representative measure of IT Service quality from a business perspective.

Consider how much more valuable your IT reporting would be to the Business if it was understandable and spoke to the reader in terms they understood. Further, consider the worth to the reader of your reports if they could make decisions quickly based upon the data contained in your reports. IUM delivers this value.

A Model for IT Service Quality

The needs of a User to accomplish their primary job function using IT Services serves as a model for understanding IT Service Quality. The best analogy is to imagine that you are a User trying to get a job done while relying upon IT Services. What would you think about?

“Can I get to where what I need to do my job resides?” (Availability)
“Can I get enough of what I need in time to do my job?” (Capacity)
“Can I get access to what I need to do my job when needed? (Security)
“Is what I access correct?” (Data Integrity)

There are just four aspects to the model: Availability, Capacity, Security, and Data Integrity. Within IT, we have direct control of Availability, Capacity, and Security. However, we don't have control over Data Integrity. For example, we can use data input forms to assure data fits acceptable patterns and algorithms to ensure that data does not change enroute. But, bad data that fits an acceptable pattern is outside of IT’s control. Therefore, we need to focus on the three elements that IT does control: Availability, Capacity, and Security.

The IUM assumes that requirements for Availability, Capacity, and Security are in place and documented as Service Level Targets (SLTs) within Service Level Agreements (SLAs). Also required is instrumentation to measure compliance with SLTs. Then, the concept of the IUM becomes clear. Each minute during the defined period of availability in which observed Availability, Capacity or Security does not comply with stated SLTs is an IUM. For example, if the SLTs in the SLA include:

“Monday through Friday, from 8:00am to 5:00pm” (Availability)
“Lookup requests will return customer records within 5 seconds” (Capacity)
“For users with appropriate account authority” (Security)
Example 1. Service Level Agreement Excerpt

If designed in cooperation with the Customer the above SLTs allow the User to accomplish their job as required. Now imagine a condition that causes any degradation on any of the three SLTs -- for example, a network failure that causes a slowdown in returning records. The IT Service degrades, and the IUM correctly shows the IT Service is not sufficient for Users to accomplish their jobs or missions to the degree required by the business and agreed with IT.

IUM and Cost of Downtime

The benefits of IUM are twofold. First, IUM lets you clearly understand the perception of IT Service Quality. Applying IUM to an individual shows that User’s perception of the IT Service. Tabulating IUM for a department shows the departments perception of IT Service Quality, and so up the organization matrix. Second, IUM lends itself very nicely to another sought after metric -- the Cost of Downtime (CoD). At key levels of work, one can also assign a value to an IUM and develop CoD. CoD is simply a value per IUM.

CoD also requires collaboration and agreement with the Business and Customers, but the concept is very simple. For example, the sales department takes 100 orders per hour, on average. There are six Users (sales people) and each order has an average value of $149. Thus, simple math shows the CoD as:

100 (orders per hour) / 6 (Users) = approximately 16 orders per User, per hour
60 minutes / 16 (order/User/hour) = approximately 4 minutes per order
Thus, 4 minutes of downtime is worth about $149.00 per User
1 Impacted User Minute (IUM) = $37.25
Example 2. IUM Business Transaction Cost Breakdown

If the entire sales department goes down for 8 minutes, the impact on the company is:

8 (minutes down) X 6 (Users) X $37.25 (dollar value per IUM) = $1,788.00
Example 3. IUM and CoD Calculation

Such a concept easily approximates any cash, business, mission or production loss: customers lost, revenue lost, planes not launched, magazines not printed, motorcycles not manufactured, soldiers not fed, etc. Anything that relies upon IT service quality fits into the IUM theme. You can even expand the equation to take into account salary, benefits or other financial aspects of the business transaction for more actuarial accounting.

Up, Down, and Degraded

You should use IUM (and CoD) in addition to Availability reporting. Availability shows "up" or "down" -- an IT Service is "up" when the User can perform meaningful work; and "down" when the User is unable to perform any meaningful work. IUM pertains to "degraded" IT service quality. You must define the extent of the degradation and define the range of SLT's from "up" through "degraded" and then to "down" with the Business and document these ranges in the SLA.

You must take care to be clear here as there is a difference between a User who is totally unable to work, and a User who's work is impaired. You could assign a range of IUM values reflecting IT service quality deterioration. For example, only showing Capacity SLTs for clarity and brevity, the following table assigns values based on ranges of the Capacity SLT from the above example SLA excerpt:

Time To Return Customer RecordCoD per IUMState (Up, Down, Degraded)
1 to 4 secondsN/AUp
5 to 29 seconds$9.31Degraded
30 to 179 seconds$18.62Degraded
180 seconds or higher$37.25Down
Table 1. Expanded IUM/CoD Implementation

Benefits Of IUM Reporting

The primary benefit of implementing and reporting on IUM is closer alignment with the Business, and a real understanding of IT Service Quality -- one that the Business knows, understands, and trusts. There are many additional benefits to the concept of IUM:

There still exists the need for precise measurements of highly technical attributes like jitter and loss. In fact, these types of measurements may underpin IUM (and thus CoD.) However, reporting to the Business, Customers and IT Management should not be technical. The purpose of reporting is to improve decision making, and the IUM is one of the most powerful methods yet devised to illustrate IT service quality.

Consider including IUM and perhaps CoD into your Customer, Executive, and Senior IT reporting. Your reporting will be more "transparent" and less technical. You will find it much easier to communicate when you both have common ground, and you might just finally get that pet project of yours funded at the same time!

Related programs

Related articles