Subscribe

PDF Download

Back Issues
 

 

Vol.  2.41

OCTOBER 18, 2006

"7 Secrets to Successful Service Level Management"




 


DITY Weekly Reader
The workable, practical guide to Do IT Yourself

 

Manually creating after-the-fact monthly reports is not performing Service Level Management.  SLM must show both current and past status as well as predict future problems, and this requires automation and daily or even real-time analysis of data.

Hank Marquis, 2006, CTO
hank

MARQUIS

Articles
E-mail
Bio


By Hank Marquis

Most of us know Service Level Management (SLM) is a process described in the IT Infrastructure Library® (ITIL®).  Most of us know SLM encompasses more than writing Service Level Agreements (SLA) and Service Catalogs as well.

 

The ITIL defines Service Level Management as “Ensuring that agreed IT services are delivered when and where they are supposed to be delivered.”

 

IT services are collections of Configuration Items (CIs) considered as an end-to-end system that delivers utility to a user of the system.  Some example services include: SAP, CRM, Email, Stock trading, etc.

 

Many think SLM is simply reporting on how IT did during the last month.  But SLM is much more than that.  SLM can and should drive the entire IT organization, and performing SLM correctly can also result in significant operational improvements and cost reductions.

 

Here are just 7 ways effective SLM can improve IT service quality, achieve business/IT alignment, increase IT efficiency, and reduce the costs of IT:

  1. Obtain refunds, rebates, credits through vendor management

  2. Gain higher discounts through consolidation identified by vendor management

  3. Reduce costs through idle capacity re-allocation and proactive performance and capacity forecasting

  4. Avoid redundant and/or unneeded infrastructure investments

  5. Save money on maintenance renewals with improved asset management

  6. Improve organizational responsiveness

  7. Increase end-user productivity

Obtaining these benefits requires work, and some tricks of the trade.  Following I share 7 secrets to SLM success. 

Secret #1: Service level reporting is not service level management

Many IT organizations misunderstand the role of SLAs and focus them on technical parameters of a single technology instead of an end-to-end business description.  They monitor SLAs independently of one another and usually via manual integration of results. 

 

Today, most organizations implement service level reporting, not SLM.  The problems with service level reporting include:

  • Its old news

  • Manual process with 10s to 100s of people

  • Reams of paper, error-prone and subjective

 

Service level reporting is a manual process that typically describes what happened during the last period.  This is am important distinction – SLM seeks to ensure uninterrupted quality of service.  Service level reporting simply reports what happened. 

 

For example, Service level reporting might say “we went down for 8 hours on Thursday the 20th.”  This adds little value to the business – don't you think that they already know they could not operate for 8 hours that day?

 

Effective SLM would seek to proactively observe SLA compliance, strive to predict an outage coming, and take action to prevent the issue before the outage occurred.  ITIL refers to this as a Service Improvement Program (SIP.)

Secret #2: Effective SLM requires clear, service-oriented SLAs

Service Level Agreements are agreements with customers about services for users.  SLAs encompass the agreed upon service levels, modeled after business, customer, and user requirements.

 

A SLA represents the negotiated requirements of the customer, or Service Level Requirements (SLR.)  SLAs should be visible to both customers and IT and it is important to keep SLAs in easy to understand business terms. 

 

Instead of describing the service as follows:

 

“the server shall maintain 1000MB free disk space at all time, never have less than 40% CPU cycles available for more than 2 seconds, have at least 256MB memory free for caching at all times, and will not swap to disk more than 10 times per second.”

 

Consider instead:

 

“the email service shall always allow the user to store messages with attachments of any size up to 10MB.  All emails and attachments combined shall not exceed 1000MB.  Any new emails and attachments that when combined with existing stored emails and attachments are larger than 1000MB will not be stored until other emails and attachments are deleted by the user.  An authorized user shall be able to open a stored email in an average of 5 seconds, and no email shall take longer than 30 seconds to open.  By special arrangement customer can request additional storage by contacting the service desk and referencing attachment A.”

 

Note the difference.  What exactly does the first paragraph say?  Someone in IT might be able to understand what it means, but it certainly does not describe things in business terms. 

 

The second paragraph is very clear, describes what the user will be able to do with the service, and also provides a “way out” should one be required.

Secret #3: Effective SLM requires definition of multiple Requirements and Targets

Service Level Requirements (SLR) express the needs of the customer.  Service Level Targets (SLT) reflect the objective of the IT service provider organization.  SLRs and SLTs do not always equate. 

 

For example, the requirement from the business (SLR) might be for sub-second response time, but the best possible solution available might be 2 seconds (SLT).   The business may also need at least 95% uptime and IT might be able to provide 99% uptime.  Thus, SLR and SLT are not the same, and they may not align.

 

SLRs get negotiated into SLTs, and the SLTs go into the SLA.  SLTs should be expressed in terms of a Critical Success Factor (CSF) and measured by means of a Key Performance Indicator (KPI.)

 

If possible you should include multiple such statements, each equally easy to understand.  For example:

 

“Monday through Friday, from 8am EST to 5pmEST an authorized user of the system can contact a support agent via phone, fax, email, or instant message.  A support agent will acknowledge the request for support within 5 minutes of receipt, collect relevant information from the user (including workstation ID, employee ID, application name, incident details) within 15 minutes of receipt, and escalate or resolve the incident with 30 minutes of receipt.”

 

“Average mean time to repair (MTTR) where MTTR means the elapsed time from acknowledgement of receipt to restoration of normal activities shall be no longer than 4 hours.  Any incident with MTTR of more than 4 hours will be reviewed by a specialist to determine why, and those reasons shall be shared with the customer at the customers’ discretion and convenience.”

 

Note how the SLA shapes up to be very readable, clear, and concise.  Each SLT is its own statement.  Each SLT is self-contained, measurable and thus reportable.

 

Other elements to include the SLA include signatures, reporting intervals, costs (if any) and other related data.  Keep the non-service legal stuff in a contract, not the SLA.

Secret #4: Effective SLRs & SLTs require LOB alignment

Lines of Business (LOB) are tiers of customers or users.  They often include locations, specific applications, unique requirements and hours of operation. 

 

One SLA often cannot meet the needs of all lines of business.  The only way to understand the unique service level requirements of each line of business is to meet with the them and discuss their unique needs.

 

Only this way can you establish the breadth and depth of the service level requirements that you will publish in the SLA.

Secret #5: Effective SLA’s require breadth and depth

An SLA requires multiple dimensions including: Capacity, latency, availability, performance, continuity, security, etc.  Consider the following statement from an SLA:

 

“Monday through Friday, from 8am EST to 5pmEST, when operating under normal conditions (where normal means not in dial-backup) an authorized user of the system can retrieve a customer record in less than 5 seconds.  No more than 10 times during the month when operating under normal conditions will any record take more than 30 seconds to retrieve.”

 

Note the inclusion in business terms of:

  • Availability “Monday through Friday, from 8am EST to 5pmEST”

  • Continuity “when operating under normal conditions (where normal means not in dial-backup)”

  • Capacity “can retrieve a customer record in less than 5 seconds)

  • Security “an authorized user of the system”

 

Note how the statement also includes the ability to “miss” the main target no more than 10 times per month.

Secret #6: Effective SLA’s require multiple data sources

With multiple clauses and targets a SLA requires multiple information sources.  These include availability, capacity, continuity, and security.  SLM requires data from many CIs.  The best place to get this data is from the operational tools that manage those CIs, making data mining a key.  This data will come from a number of sources:

  • Mainframe, mid-range

  • Network, applications

  • Web, HR

 

The best place for acquiring this raw data is existing systems and network management platforms already in place.  The golden rule is to agree only to targets for which you have monitors.

Secret #7: Effective SLM requires automation

Several studies have shown that most SLAs are incorrectly reported.  The sheer volume of data and the calculations required to produce the report generates errors in the report.

 

Put another way, it’s virtually impossible to aggregate all the target metrics and data, add them up, do the math, and then transcribe them correctly.  Invariably, errors get introduced, making the SLA report useless.  As the number of SLAs and targets increase the errors multiply.

 

To resolve this issue more than one company has an entire group responsible for preparing the “monthly report” – and this is the trap.  Monthly reports done manually are almost always after-the-fact and thus service level reporting, not SLM.

 

Make every effort to use automated systems.  Define and agree targets with customers for which you have or will obtain a monitoring tool.  Then, integrate using programs or calculations that require no human interaction.  The more human touch the more chance for the introduction of “human error” into the report.

Summary

Service Level Management must align with user needs.  SLAs must include targets for capacity, availability, security, continuity, etc.  Target attainment data will come from many sources, and you need to track the target at its source, as well as the entire SLA (end-to-end).  SLM solutions must show both current and past status as well as predict future problems, and this requires automation and daily or even real-time analysis of data.

 

The benefits are many:

 

  • By monitoring vendor performance that underpins your SLAs you can manage vendors. 

  • Often you can obtain savings by obtaining refunds, rebates, credits due to poor performance. 

  • Vendor management can lead to higher discounts through consolidation. 

  • Monitoring utilization of service CIs lets your spot idle and/or underutilized capacity. 

  • You can reduce costs through idle capacity re-allocation. 

  • Proactive performance and capacity forecasting driven from SLA and SLR monitoring helps plan purchases. 

  • Improved asset management produces savings by avoiding redundant infrastructure investments and saving money on unnecessary maintenance renewals. 

  • Sound SLM increases end-user productivity.

 

All in all a real positive payback awaits those who move from service level reporting to Service Level Management!

--

Related articles:

Where to go from here:

  • Subscribe to our newsletter and get new skills delivered right to your Inbox, click here.
  • Download this article in PDF format for use at your own convenience, click here.
  • Browse back-issues of the DITY Newsletter, click here.

Entire Contents © 2006 itSM Solutions LLC.  All Rights Reserved.