Best Practice Insights Focus On: ITIL Service Operation

Best Practice Insights Focus On: ITIL Service Operation
Best Practice Insights
Focus On: ITIL Service
Updated for ITIL 2011
Table of Contents
This publication by Anthony Orr has been revised to bring the
content up-to-date with IT Infrastructure Library (ITIL®) 2011.
Orr is the BMC Software director of Service Management and
works within the Office of the CTO at BMC. He is one of the
authors for the ITIL 2011 update and a senior ITIL Examiner
for APMG.
Orr has more than 30 years of information technology
experience. We greatly appreciate the contributions of the
following individuals to the original version of this
publication: Ken Turbitt and Frederieke C.C. Winkler Prins.
Note to Readers: This publication highlights key elements of ITIL® Service Operation, 2011 edition, by the Cabinet Office
and published by TSO (The Stationery Office) and includes commentary on important concepts from BMC ITIL experts.
BMC commentary is highlighted in grey quotes.
ITIL® Service Operation is the fourth publication in the lifecycle set of IT Infrastructure Library (ITIL®) publications. The
publication provides a comprehensive overview of service operation, where IT Operations runs the business and where the
customer feels the greatest impact of the IT organization’s efforts.
Successful service operation requires effective planning, collaboration, a proactive approach to preventing issues from
occurring, and a swift and effective problem management process. The ITIL Service Operation book assists readers in
developing the best management practices so that the business and its customers are satisfied and receive the value they
expected in the strategy and design stages.
It is in the service operation phase that you know for sure whether you have successfully adopted the most effective strategy,
produced the best design, and conducted the most effective transition to live operation, delivering that return on value. It is
also here that the effects of improvement initiatives are actually felt by the business and customers.
Service operation brings service management to life for the business. It also establishes accountability for the performance of
the services, the people who create them, and the technology that enables them. Wellplanned and implemented services will
not give the required benefits if day-to-day operations activities are not properly conducted, controlled, and managed. Nor
will service improvements be possible if day-to-day activities to monitor performance and gather data are not systematically
conducted during service operation.
In many ways, the ideal service operation environment is one in which services are delivered to the customer without incident.
However, while stability is good, there is a balance to be struck to ensure that the operation isn’t stagnating. You must be able
to respond to the changing needs of the business while being cognizant of the current constraints, such as resources, costs,
and technology.
The ITIL Service Operation book provides guidance on all aspects of managing the day-to-day operation of an organization’s
IT services, or “business as usual,” as it is referred to in the ITIL publication. It contains guidance on the operational aspects of
many processes and focuses on incident and problem management. Also, for the first time, at the core of ITIL is advice on the
organizational and functional structure. It emphasizes that all operations staff must be fully aware that they are there not just
to manage the technology but also to provide service to the customers and the business.
This book gives an overview of service operation and provides
commentary from BMC Software experts. The commentary
includes practical guidance as well as real-world examples.
Vernon Lloyd, FISM (Fellow of the Institute of Service Management)
International Client Director
Co-author of the ITIL Service Design publication by the Office of Government Commerce (OGC)
Service operation encompasses the day-to-day activities, processes, and infrastructure that are responsible for delivering
value to the business through technology. Service operation is the stage in the lifecycle of a service where value is realized for
the customers, users, and supplier of the service. Just as most people expect the lights to turn on at every flick of a switch,
business users have become completely dependent on the capabilities provided by IT services.
Think of service operation as a managed service provider or as a utility company responsible for providing the power that
customers need to do their jobs. Without electricity, many activities would come to a halt. And without the processes put in
place to ensure delivery of that electricity, the service would be unreliable. Utility companies must also be proactive, even
when it means trimming trees to prevent outages that result when falling branches sever electric lines. Customers don’t care
about all the required resources (people, process, and technology) involved in delivering electricity to their homes. They just
want reliable service when they need it and at a fair cost.
The users of IT have similar expectations in their consumption of technology services. As a result, IT organizations must work
to ensure that the underlying service delivery and support infrastructure is optimized to provide continuous value and service
to their customers. This work, coupled with the ability to detect and recover from problems quickly if any disruptions occur, is
the hallmark of a proactive IT organization. Effective operations teams must first work to prevent problems and, if an issue
occurs, understand the impact from a user’s perspective, then follow up with swift corrective action for the restoration of the
service. Consumption of IT services by employees should be simple, efficient, effective, economical, and engaging.
Just as a utility company provides various service packages to its customers— such as energy conservation programs along
with the delivery of gas and electricity— IT offers a catalog of services to its customers. The ITIL Service Operation
publication focuses on the well-planned, preventive, detective, and corrective capabilities, functions, processes, and controls
that need to be in place to provide continuous utility to the business based on the promised warranties and service level
agreements (SLAs). The ITIL publication stresses the importance of measuring the experience from a user perspective,
instead of merely monitoring all of the discrete infrastructure components.
The ITIL Service Operation book has been updated to reflect new models and architecture, such as shared services, mobile
commerce, and cloud computing. User consumption of IT resources for software as a service (SaaS), platform as a service
(PaaS), and infrastructure as a service (IaaS) is totally dependent on IT asset availability. Operations must be agile and high
performing; otherwise, users will seek alternate solutions to enable business outcomes, introducing new risks and complexities. Operational process policies must evolve and adapt accordingly. The ITIL lifecycle, especially service operation,
supports this IT evolution and transformation.
IT must be able to create a “consumerized” experience for users interacting with the services they provide. This experience
should expand on the concept of user self-service and work effectively with mobile computing platforms. For example, a
virtual assistant that enables self-service on the user’s device can improve the service experience and simplify how IT engages
with the user in the delivery of support and services. By enabling the user to install applications, gain access to secure content,
request help with problems, view the status of items, and ask questions without location and support constraints, user
productivity as well as the perception of IT can be improved.
With the growth in Bring Your Own Device (BYOD) initiatives, you need to manage personal devices with the same rigor as any
other corporate-owned device. Finally, as more people turn to social media for IT support,you need to incorporate and
integrate social media channels with your IT Service Management (ITSM) solutions to enable the service desk to easily and
seamlessly engage with the user.
The ITIL Service Operation publication covers the concept of service management as a practice and how to deal with
organizational responsibilities and provide service. ITIL defines service management as “[a] set of specialized organizational
capabilities for providing value to customers in the form of services.” 1
The ITIL Service Operation publication also reviews important processes and includes valuable templates in the Appendix
sections to help guide you in those processes. In addition, the publication includes a description of common activities based
on the various groups within IT, many of which are summarized in this book. The publication also provides technology
recommendations to help you run your operations more effectively.
The Introduction section provides a brief summary of the five publications in the ITIL Core: Service Strategy, Service Design,
Service Transition, Service Operation, and Continual Service Improvement. ITIL follows a service lifecycle approach that
represents a broad, all-encompassing view of service management, as shown in Figure 1. This approach focuses on gaining an
understanding of the service management structure, the ways in which all of the components are interconnected, and how
changes in one area may affect the whole system.
Each stage of the lifecycle influences the other stages and relies on them for input and
feedback. This interaction and interdependence between stages creates a lifecycle that
is highly dynamic in nature.
FIG 1: ITIL Service Lifecycle Approach
ITIL English 2011 Glossary, See service management.
© Crown copyright 2011. All rights reserved. Material is reproduced with the permission of the Cabinet Office under delegated authority
from the Controller of HMSO.
For example, service operation should include a strategy for improvement initiatives. Service operation is directly
supported by service strategy and continual service improvement, and the results should be designed and transitioned into
operations effectively and efficiently.
The ITIL publications can be summarized as follows:
Service Strategy: Encompasses the policies and objectives
required to design, develop, and implement the lifecycle approach.
Service Design: Involves taking the service strategy, gathering
service requirements from the business, and then determining
which IT resources will provide the integrated services.
Service Operation: Offers guidance for efficiently
delivering and supporting services to ensure
customer satisfaction.
Continual Service Improvement: Focuses on an
environment of learning and enhancement.
Service Transition: Offers guidance about capabilities needed
to move the new services and changes into the service
operation stage.
ITIL is part of a portfolio of publications known as Best Management Practice (BMP) used to help manage projects, programs,
and services. ITIL is successful because it is vendor neutral, offers time-tested practices, and represents the combined
learning experience of leading service providers.
IT needs to be integrated with the business. By following the principles of service operation discussed in the ITIL Service
Operation publication, IT can increase its standing as a strategic asset of the business. IT must focus specialized skills,
capabilities, and resources to support business outcomes.
With closer collaboration, IT can help the business become more effective, efficient, and economical. Through innovations
such as cloud computing, social, and mobile technologies, IT can help the business unlock new opportunities and explore
different ways of working.
IT operations can help deliver the “power” its customers need to be successful. Ultimately, IT should aim to provide users with
the same excellent experience at work that they enjoy with their personal devices at home. In a very real sense, the
expectations that users bring into the workplace are helping to increase the performance of the IT organization.
Be sure to review the Appendix sections of the ITIL Service Operation publication.
A glossary is included following the Appendix sections, but you can also use the electronic version of the ITIL glossary at 2
The Appendix sections provide useful guidance on the following:
Appendix A: Related Guidance
Discusses ITIL guidance and Web services, Quality Management System, risk management, governance of IT, COBIT, ISO/
IEC 20000 Service Management Series, environmental management and green/sustainable IT, ISO Standards and publications for IT, ITIL and the OSI framework, program and project management, organization change, skills framework for the information age, Carnegie Mellon: CMMI and ESCM framework, Balanced Scorecard, and Six Sigma.
Appendix B: Communication in Service Operation
Refers to routine operational communicationbetween shifts, and performance reporting. It also covers communication related to projects, changes, exceptions, emergencies, and global communications, and communication with users and customers.
Appendix C: Kepner and Tregoe
Includes guidance in how to define problems and establish, test, and verify their causes.
Appendix D: Ishikawa Diagram
Shows how to identify and present causes of a problem on a chart.
Appendix E: Considerations for Facilities Management
Provides guidance related to building management, equipment hosting, power management, environmental conditioning and alert systems, physical access and control, shipping and receiving, supplier management, maintenance, and office environments.
Appendix F: Physical Access Control
Reviews access control devices.
Appendix G: Risk Assessment and Management
Discusses the definition of risk and risk management, management of risk, ISO 31000, ISO/IEC 27001, and the Risk IT
process framework.
Appendix H: Pareto Analysis
Describes how to analyze and separate important causes of failure from more trivial issues.
Appendix I: Examples of Inputs and Outputs Across the Service Lifecycle
Explains how the different lifestyle stages interact.
Many of the themes in Chapter 2 are also discussed in the other ITIL publications updated for ITIL 2011, and many of the underlying
principles of service management have been carried over from earlier versions of the ITIL publications.
Chapter 2 introduces definitions that provide a basis for the ITIL framework and presents concepts that are essential to service
management success. It focuses on the importance of delivering value by providing the results expected by end users. This involves
moving beyond business-IT alignment toward business-IT integration. This chapter describes the types of conversations that should
take place about the definition and meaning of services. It discusses the different types of service providers— internal, shared
services unit, and external— and their roles.
Significant points stressed in this chapter are value creation, the importance of organizing for service management, and the
service lifecycle. The overriding message in this chapter is to think about how the services you provide are architected in the
context of how service value is created and realized for your customers. Please refer to the ITIL glossary 3 for the definitions
of the following terms: outcome, service, IT service, service management, service provider, IT service management (ITSM), IT
service provider, asset, customer asset, service asset, process, role, management system, ISO 9001, and ISO/IEC 20000.
Everyone in the organization should be considered a stakeholder for service management. Service is everyone’s
responsibility, no matter what role they play or how they play it to deliver and support services for their customers.
External stakeholders— customers, users, and suppliers— also should be considered. These stakeholders, along with the
organizational stakeholders, are an example of the agency principle.
Utility of Service
Customers want to achieve business outcomes by using services that are fit for their purpose. The utility of a service must
support the customers’ performance or remove a constraint. Customers can become very frustrated with a service that is fit
for their purpose but lacks sufficient warranty for their use. Think of utility as what the service does.
Warranty of Service
This chapter provides guidance on warranty of service, or how the service is delivered. You can communicate warranty to
customers in terms of commitments to availability, capacity, continuity, and security of the utilization of services.
Availability means that the customer can use your service under the terms and conditions you have mutually agreed upon
Capacity ensures that the customer will be able to utilize the service at a specified level of business activity or that demand will be fulfilled at a specified quality level
Continuity guarantees that the customer will be able to use the service even if you experience a major failure or other unexpected event
Security means that the customer’s utilization of services will be free of specific risks
Many of the services IT provides are considered commodities. You create a competitive advantage when you are able to
deliver a certain level of warranty to your customers.
Customers, both internal and external, need to be confident that you can effectively and consistently support their business
strategies. Since service providers are constantly matching others’ service offerings, you must constantly improve your value
proposition to stand apart. Use one or more of the service management processes to drive these improvements.
Service Assets
According to ITIL, resources and capabilities are types of assets that organizations can use to create value for their
customers. Resources are direct inputs to produce a service, while capabilities are the organization’s ability to utilize
resources to create value. You can create differentiation and retain customers by developing distinctive capabilities that are
difficult for your competitors to replicate.
Processes have inputs or triggers, defined actions and activities, and an output or specific results. Processes also have
metrics and deliver primary results to a customer in the form of services. Capabilities and resources internal or external to
the organization enable processes. Processes should follow enterprise governance standards, and policy compliance should
be built into them. Governance ensures that the required processes are executed correctly. Processes are executed by people
and sometimes are enabled by technology implementations. When processes are collaborative and integrated appropriately,
the output from one process can provide input to the next process for the service that is delivered or supported. Processes
should also be efficient, effective, and economical for the services that the process supports.
Service Lifecycle
The service lifecycle is dynamic, as each stage of the lifecycle supports other stages. Specialization and coordination across
the lifecycle are essential for the delivery and support of services. The service lifecycle should work as an integrated system
that includes feedback mechanisms for continual improvement.
Look at the Big Picture
Regardless of whether IT services are provided internally by IT or externally by service providers, the IT organization is
ultimately responsible for bringing together the relationships, knowledge, and methods necessary to deliver these services
in the service operation stage of the IT service management lifecycle. According to ITIL, an IT service is “[a] service provided
by an IT service provider. An IT service is made up of a combination of information technology, people, and processes. A
customer-facing IT service directly supports the business processes of one or more customers, and its service level targets
should be defined in a service level agreement. Other IT services, called supporting services, are not directly used by the
business but are required by the service provider to deliver customer-facing services.” 4
Every IT department has a limited set of resources. Service operation makes you focus on the activities most important to
the business and prioritize work based on business requirements and impact. Service management emphasizes the need to
understand that service operation is where value is delivered and supported. Service operation is the face of the organization
to the customer. Quality of service and quality of experience are very important; one without the other can lead to customer
Service Operation Delivers Daily
The service operation stage encompasses the day-to-day coordination, organization, and management of all the people,
processes, and technologies required to provide services to clients, whether the clients are business users or customers
outside of the company. Service operation takes all of the work accomplished in the service transition stage and uses it to
provide services to customers, keeping services functioning indefinitely until the business requirements change and the
service must be modified or retired.
All of the assumptions and planning for value creation from the earlier lifecycle stages are proven correct or incorrect in the
service operation stage, when value is realized. That’s why it is important to work closely with those responsible for service
strategy, service design, and service transition so there are no surprises when you put the services into production.
ibid. See IT service.
The services are delivered through the IT infrastructure and by people, just as the services of a utility company are delivered
through the company’s infrastructure— meters, pipes, and power lines— as well as by the people who support those
services. In service operation, as in all of the other lifecycle stages, understanding and taking action based on the customer
perspective is essential. A power company would make every effort to restore power after an outage; similarly, your
customers expect the same level of effort in providing IT services.
Service operation is crucial to keeping services up and running. The business quickly embraces new services and perceives
them as a given. That’s why it is difficult to obtain budget increases for additional services, even when a service offering has
expanded and the budget increase seems justified.
Pay close attention to the chapters “Service Management as a Practice” and “Service Operation Principles.” By doing so, you
can better ensure that your business operation strategy provides direction and identifies constraints. These fundamentals will
help you determine how to prioritize work in service operation, figure out what to do with the information, and identify the
services you need to support and deliver. They will also help you understand where the SLA objectives fit in.
IT organizations should focus operations on the customer. At the end of the day, it is the customer who pays for the cost of
operations. Customers do not want to pay for what they cannot use, and customers who do not receive the service they need
will look for alternatives.
The service operation team is responsible for providing services— including all of the processes, activities, and functions that
deliver them— at the best quality and lowest price to help their companies stay ahead of the competition. They must focus on
end-user satisfaction by providing IT consumers with the same experience in business as they have in their personal
experiences with technology. The team must guide the support services and functional components of those services. Clearly
defining the roles of every part of the organization that contributes to providing each service can help you get control. Yet
service operation is often a balancing act between competing factors, such as change and maintaining the status quo. It
requires balancing IT and business perspectives, stability and responsiveness, quality and cost of service, and being reactive or
When following effective service operation principles, you can deliver more value to the business by managing IT from a
business perspective, an approach known as Business Service Management (BSM). ITIL defines Business Service Management
as “[t]he management of the business services delivered to business customers.” 5
Balancing Change with Stable Service Operation
The business asks IT to perform two essential activities: projects within the portfolio, and operation of services related to the
specific projects. To successfully implement projects, you need to implement changes, whether they are organizational,
technical, or procedural. Service operation is all about achieving this balance, which is the key to stability. Any change is
associated with an accompanying risk.
There may be a conflict between the IT view and the business view. On one hand, the business wants to remove constraints,
drive new value, and create new opportunities through services. On the other hand, the business may assume that IT can
accomplish those objectives without affecting the reliability of the existing services.
ibid. See business service management.
That conflict plays out in the transition of services, primarily change management. The health of the change management
process determines the overall ability of IT operations to embrace as much change as the business requires. IT operations
must manage knowledge related to each change and must control communication, while remaining stable during all changes.
In the example of the utility company, operations must maintain uninterrupted service to customers, even while adding or
upgrading technology, equipment, or a large volume of customers. The business view is that customers should continually
receive better, more comprehensive services, while the IT view is that IT must ensure that the upgrades don’t overwhelm the
current systems, leaving customers in the dark.
Consider Both Perspectives: Internal and External Business
The business focuses on the end results of the service and the value and utility it provides. IT has a fundamentally different
view of all the elements required to provide the service.
The external view of IT is influenced by the experience the customer, or business, has with the services provided. The
customer is not concerned with the brand of the server, the manufacturer of the equipment, or the color of the cables in your
data center. Instead, the business cares about the business constraints that the service eliminates or the increased
performance value it provides. The business wants to know, “Can I get more done faster, better, or at a lower cost because
you gave me this service?” The business expects the specialized capabilities of IT to focus on business outcomes.
The internal view of IT is how IT sees all the pieces, subcomponents, and infrastructure dependencies that are necessary to
provide the end business service. A business service is often composed of many discrete components— strategic assets to the
overall business — including servers, switches, routers, cables, and a variety of software applications. These all combine to
deliver one service or many services. IT clearly wants to make sure all components function; the challenge is to make sure all
components function to support the business effectively and meet the needs of a workforce that is increasingly dependent on
infrastructure they own themselves.
Coordinate Processes for Greater Stability and Responsiveness
Balancing stability and responsiveness relies on coupling both service transition and service operation processes— such as
change, incident, and problem management — and effectively driving down the amount of unplanned operational work. For
example, whenever an unplanned outage happens, people drop what they’re doing and restore service. They drop planned
work, which is typically project-related work or activities to prevent additional outages. A spiral begins: The more you fight
fires, the more time you spend putting out new ones. And the fact that you’re lighting some of your own fires and
implementing urgent changes as part of your firefighting activities means that these changes have been less tested and carry
more risk.
The earlier you are in the service lifecycle and in understanding how service value is realized, the lower the cost of defect
repair and the greater the customer satisfaction. By creating value through service strategy, service design, and service
transition, you can alleviate potential service operation problems. Service operation provides feedback to the other stages for
improvement and lifecycle process maturity. The feedback can be direct, within the continual service improvement (CSI)
register, or initiated by the CSI process. The quicker you can get improvement changes or enhancements related to delivered
and supported services into the service strategy and service design stages, and then flow them through service transition
before they get into operation, the better you can ensure improvement for current and future services. The lessons learned
regarding defects, bad ideas, or problems are addressed earlier, before services are put into production. Once in production,
the cost of fixing the service is typically high, both in terms of money and disruption to the business, and in terms of how the
business views IT.
Consider auto manufacturers, for example. If they catch a defect on the assembly line, it’s more expensive to fix than catching
the problem while it’s on the drawing board. However, if they can catch it before it hits the end of the line, that’s still more
desirable than catching it when the product is out to customers. Recalls are far more expensive than remedial repairs. How
can you guarantee, based on the risk of the activity you propose, that it receives the right amount of scrutiny at the beginning
of its life, before it becomes something that consumes everybody’s energy in production?
The following represents some actions that will help an organization achieve a better balance between stability and
Invest in technologies and processes that are agile, rather than rigid
Enlist input from service operation during the ongoing design and transition stages
Ensure that IT is involved as early as possible in changes that impact the business, and initiate these changes at the earliest applicable stage in the lifecycle. There is little point in designing a change to a service if operations cannot support it.
The ability to support a service needs to be known at the start.
Make certain that those responsible for service transition do not “throw the service over the fence” when it is implemented, but work in partnership with the operations team for a defined period (or for early stage support) before fully handing it over to operations. This approach also helps increase agility when working in a DevOps environment.
Quality of Service Versus Cost of Service
A big challenge for service operation is to consistently deliver services in a timely manner while controlling scope, costs, and
resources. Early in the service lifecycle, you typically can spend the appropriate amount of money and get a much greater
return. Later, when the service is in production, more money, time, and effort will be required to increase the performance or
availability of the service. To save money and increase credibility, focus on delivering the service according to customer
expectations or SLAs before the service is in production. Customers will want improvements, but you should deliver the
quality promised first.
There must be a delicate balance between a focus on cost and a focus on quality. If you have an extreme focus on quality,
you’re subject to escalating budgets and increasing demands for higher-quality services. You wind up delivering more than is
necessary, or “gold plating,” to achieve business objectives. If you have an extreme focus on cost, then the quality of service
might be limited and the business may try to get even more services for less than IT can provide. Cost of service should be
managed by IT without sacrificing quality. Quality is in the eye of the customer. IT also should understand the customer value
of the service, so IT can properly price the service relative to cost.
A set of financial management tools and processes can help organizations account for the cost to provide a service.
Technology such as the configuration management database (CMDB) can help you discover IT assets related to a service for
understanding IT costs of the particular service. You can use this knowledge from the CMDB as you build a related service
catalog, and financial management can federate information from that service catalog to understand the cost
of services.
Take some of your cues from the Agile software development approach, which is used to expedite the development of
software. This approach involves prioritizing what should be in the new release and then picking the highest priority for your
first phase. Improvement should be measured weekly, and project success should be measured in weeks, not months. Make an
improvement, document it, and perform a quality assurance review. This approach will give you greater control over quality,
help mitigate risk, and allow for early corrective actions. It may not have all the pieces, but the highest-priority features will be
there. This approach addresses cost, quality, and timing. If you don’t follow this approach or a similar approach, it can take too
long to accomplish your objectives, and you’ll miss out on business competitive opportunities.
Anticipating Operation Costs and Providing Business Value
The stages in the ITIL service lifecycle include the planning, design, or improvement of existing and new services. All of those
assumptions, plans, methods, and timelines are tested in reality during service operation. The only point in the lifecycle where
a service provides value to the customers is service operation. This is where customer assets interact with supplier assets to
make the service real. If you think about the utility analogy, service transition is the process responsible for making sure that
the power gets from the utility lines to your home. Service operation begins when you turn on the lights.
In the strategy and design stages of the service lifecycle, it’s important to anticipate the cost of project-related expenses. But
many organizations have difficulty anticipating the cost to support the continuous operation of a particular service. Often,
after a new or modified service is up and running, the business views the service as part of business operations. In other
words, the service becomes part of the basic tool set the business needs to do work. Capital and operational costs must be
managed effectively to enable an organizational growth strategy.
Customers may take the service for granted until an outage occurs. Many IT organizations struggle when asked to deliver new
and more services, which require support from staff, as well as the tools, knowledge, and capabilities needed to perform the
support. Those increased demands on the IT organization usually mean that IT needs larger budgets, operational innovation,
or more automation technology.
Is Your Organization Reactive or Proactive?
The ITIL Service Operation publication recommends proactively maintaining IT services. However, achieving this objective
relies on the maturity of your IT organization and whether your culture can support innovation or is more likely to resist
change. This objective can also be affected by IT’s influence over the business and several other factors.
Reactive organizations are prompted by external influences, whereas proactive organizations continuously scan their
environment for threats to reliability, productivity, and new innovations that apply to services. ITIL says that an organization’s
maturity largely determines how reactively or proactively an organization behaves. High-performing organizations focus on
being preventive, but if you can’t prevent problems, you must be able to detect and correct them very quickly.
Reactive organizations usually focus all their efforts on corrective actions rather than preventive activities. Being proactive
also entails making proposals to the business about new services or improvements to existing services that contribute to the
bottom line. Improvement might be from cost savings or new services that help the company differentiate itself from
Proactive organizations tend to rely on a system of preventive, detective, and corrective controls to automate functions and
direct their activities. These controls help improve efficiency because they save time and reduce the risk of errors associated
with manual processes.
Reactive organizations tend to rely on massive corrective capabilities, which are much more expensive. As a result, reactive
organizations tend to have greater expenses and more unplanned work, which also means they tend to have problems
completing projects. A reactive utility company may focus its efforts on perfectly executing well-practiced crisis strategies,
such as restoring power after a major storm, equipment malfunction, or other accident. However, overall work suffers
because so much effort is spent during these short bursts that time must be given for employees to recover and for planned
projects to be restarted.
Yet it is a good practice not to be extremely reactive or proactive. Being extremely proactive may cause an organization to
modify services that don’t need to be fixed to meet the customers’ expectation for the service. This extra effort will increase
the organization’s total cost of ownership without adding any business value— only IT-perceived value. An extremely
proactive utility company, for example, may focus too much on perfecting the minor details in operational systems and
programs. This can cause unnecessary effort, and the organization might be unprepared for a major, unforeseen crisis.
Staff Involvement and Operational Health
The IT operations staff should have input into the strategy, design, and transition stages of services to help establish the
performance and evaluation criteria that will be used to judge the success of services. If the people who operate the services
do not define what they consider to be important priorities, then the specifications and criteria will come from the strategy,
design, or transition teams and will not reflect the real operating environment. The people on the service operation team
generally have the best idea of the impact the proposed changes may have on the new service and on the existing operational
environment. They also understand the effect those changes will have from cost, technology, and performance standpoints.
Yet IT operations should not work in a vacuum; the improvements they suggest should be considered within a customerfocused approach.
This chapter also reviews how to assess the operational health of the IT infrastructure by checking systems for problems that
do not immediately impact the vital signs of the infrastructure. ITIL refers to this as a self-healing system, where resilience is
designed and built into the system, for example. Such a system can let you shift processing from one physical device to the
next without disrupting a service, such as through virtualization or cloud computing. Another example of self-healing involves
using built-in monitoring to detect events and determine which ones are normal and which ones can be handled by automated
event management solutions. An additional example includes being able to raise an alert or generate a ticket.
As users become more mobile and use different devices based on their locations, they too can participate in ensuring
operational health. Often users will detect problems with resources that may not be monitored traditionally. For example: A
mobile user tries to print a document at a printer. The printer is not working, but the user knows of another printer, so he or
she does not typically bother to create an incident ticket. However, if a suitable app were provided for mobile devices, users
could very easily report the printer failure immediately to IT.
Document to Increase Efficiency
It’s useful for the IT operations team to participate in defining runbooks and maintenance manuals for any process the team is
responsible for, such as change management or incident and problem management. Operations also should help define the
relationships between process areas by working with the process owners. By documenting the steps they need to perform
during frequently occurring situations, the operations staff can save a lot of wasted effort. In addition to maintenance and
planning documents, the operations staff should create documents that address business continuity, IT service continuity, and
availability management.
Any time the service portfolio or the catalog is modified, the operations staff should be involved in the review process. BSM
technology can help organizations with automation, processes, and procedural alignment for operational decisions.
This chapter describes how communications are changing with the increasing popularity of technology, such as social media,
chats, texting, document sharing, and the growth of virtual meetings.
The Importance of the SKMS and CMS
The ITIL Service Design publication refers to a configuration management system (CMS), which includes tools as well as
databases, such as a configuration management database (CMDB). The CMS is used to manage configuration data and
information pertaining to incidents, errors, changes, releases, and other data, and is used in all IT service management
processes. The CMS and CMDB are federated to help create a single source of accurate information about your infrastructure.
The service knowledge management system (SKMS) consists of the CMDB and the CMS. The SKMS helps stakeholders with
decision support by using the underlying CMS and CMDB. Data is translated to information, and information to knowledge,
using technology and analytics for decision making.
One way to capture and automate knowledge is through the Business Service Management (BSM) platform. BSM workflows,
analytics, reporting, and integrated solutions enable you to streamline IT operations by collecting data and translating it to
business metrics; by automating routine, labor-intensive, error-prone tasks; and by leveraging and creating collaboration
among systems, applications, and tools across silos. These actions provide common data, information, and knowledge to
enable organizational decision support. BSM helps organizations achieve their vision of having an SKMS.
Effective service operation enables you to manage IT from a business perspective, also called a BSM approach. BSM also
provides a platform for innovation in IT and supports several contemporary initiatives, including cloud computing, green IT,
virtualization, mobility, and end-user IT consumerization. As you manage IT from a business perspective, keep in mind that the
business view of IT is in terms of the value the service provides, not in terms of the individual infrastructure components.
This chapter provides a detailed discussion of the following service operation processes:
Event Management
Problem Management
Incident Management
Access Management
Request fulfillment
Service Operation delivers four main functions:
The Service desk is the central point of contact for all of the clients and users of IT services. People call the service desk if an outage occurs, if they have a request for something new (such as a printer installation), or if a change is needed.
The service desk coordinates and routes the request.
Technical Management knows what resources the IT organization has at its disposal. Getting the right people involved in the design, testing, and improvement of IT services makes the job of IT staff easier in the service operation stage.
Operations Management includes scheduling all activities and managing all resources. Operations is responsible for routine IT tasks, such as system administration and preventive and monitoring activities.
Application Management maintains applications and provides technical support, as well as subject matter expertise, throughout the application lifecycle.
The Service Operation staff supports other stages of the lifecycle. They will work in service design by designing solutions, in
service transition by building and testing solutions, and in the continual service improvement of services. The chief
information officer will work in service strategy, influencing the overall strategy of the service based on operational
capabilities and resources.
Event Management
ITIL provides the following definitions for the terms event and event management:
Event: “A change of state that has significance for the management of an IT service or other configuration item. The term is
also used to mean an alert or notification created by any IT service, configuration item, or monitoring tool. Events typically
require IT operations personnel to take actions and often lead to incidents being logged.” 6
Event Management: “The process responsible for managing events throughout their lifecycle. Event management is one of the
main activities of IT operations.” 7
Chapter 4 in the ITIL Service Operation publication provides an overview of how events are communicated, detected, and
filtered. Filtering is used to determine whether to communicate an event or ignore it based on the significance of the event.
This chapter discusses how to identify events that signify regular operation, an exception, or whether something is unusual
but not an exception.
A mature event management process that includes documentation for the operators on how to handle each type of event
prepares the way for service automation and more effective data management of events for decisions. Service automation
enables IT organizations to eliminate manual or repetitive, well-defined processes, tasks, or activities. When you have the
tools to automatically complete the reactive handling of instructions to deal with certain types of events, you can free up time
for the operators to be more proactive and work on other challenges.
The difference between monitoring and event management is that event management focuses on generating and detecting
notifications but is not as broad as monitoring. While event management works with occurrences, monitoring tracks them
and looks for conditions that do not generate events.
ibid. See event.
ibid. See event management.
Business Value
The event management process, as shown in Figure 2, monitors operations and detects exceptions that could lead to a failure,
an impairment of services, or a problem for other processes, such as security management or compliance. The detection of
such exceptions helps IT operations ensure the availability of IT systems and services.
FIG 2: The Event Management Process
Interaction with Other Processes
Event Management is an integral part of other processes and the service management lifecycle. In particular, event
information can assist your availability management and capacity management efforts. Event management can tell you when
your disk drives are getting full before they are full, for example, providing time to avert an outage of service before impacting
the customers.
Event Management should not send all events to incident management or any other process. Instead, EventManagement
should filter out and send only relative data to the other process areas. The other process areas should not need to do the job
of event management.
Key Factors to Keep in Mind
As you implement or fine-tune event management in your organization, pay attention to the following:
Correlation capabilities: Your event management system should include event correlation capabilities, or the ability to identify
the relationships among events. For example, sometimes a failure of one device will cause, or appear to cause, other failures
throughout the infrastructure, creating what is called a notification storm. A classic example would be a switch that fails, and
as a result, everything connected to that switch will fail and will not be monitored. IT would then receive a high number of
notifications, because each failure is viewed as a discrete event. Instead of creating a thousand separate incident tickets in
your system, the correlation engine creates only one ticket and notes which systems are affected. This correlation can occur
only when you have configuration management in place to understand the connectivity relationships up and down the service
Event Reviews: Operation Managers should perform event reviews to make sure that only the events that should be created
are created. The event review process also includes evaluating the completeness and accuracy of the handling instructions for
each type of event and requesting updates when necessary.
Outage Review: The operations manager should also perform an outage review periodically. During the outage review, the
operations manager verifies that the outages in the infrastructure that are monitored by network and system management
tools have actually led to the creation of events. If that is not the case, the operations manager should submit a request for
change to rectify the situation. This is also part of continual service improvement.
Metrics: Be sure to review the types of metrics that are discussed in Chapter 4 of the ITIL Service Operation publication to
measure the efficiency of the event management process. For example, this measurement includes determining the number
of events by category and significance. ITIL also recommends measuring the number and percentage of events requiring
human intervention that resulted in incidents or changes, which were caused by existing problems or known errors, or were
repeated and duplicated.
Mature IT organizations should go beyond typical event management. For example, a large utility company needed to improve
the availability of its applications and supporting infrastructure while providing better access to services for its customers.
The company initially addressed the challenge by using event management and system performance monitoring of
components. However, the IT organization realized it wasn’t feasible to monitor all the components that enable every Webbased application and its related services. For example, a firewall rule typically wasn’t considered a monitored item, but it was
still an essential component for online service. A change to a firewall rule could have an impact on the service. As a result, the
company decided to automate the process as well. The end-user transaction monitoring tools were used to replicate user
activities. This approach enables IT to always know where services are available. When the services become unavailable, IT
operations is notified by the transaction monitoring tools, and the company can intervene quickly and solve the problem.
This example proves that IT should no longer look at infrastructure outages alone, but should also look at the impact on the
services that the customers and users access.
Incident Management
ITIL provides the following definitions for the terms incident and incident management:
Incident: “An unplanned interruption to an IT service or reduction in the quality of an IT service. Failure of a configuration item
that has not yet affected service is also an incident — for example, failure of one disk from a mirror set.” 8
Incident Management: “The process responsible for managing the lifecycle of all incidents. Incident management ensures that
normal service operation is restored as quickly as possible and the business impact is minimized.” 9
FIG 3: Incident Management Process Flow
ibid. See incident.
ibid. See incident management.
Business Value
The process flow for incident management is shown in Figure 3. The goal of incident management is to reduce the amount of
time that systems are impaired or that the business is exposed to any disruption if there is an outage. The business cannot
derive value from a service that is not functioning. With the help of service level management, incident management also
focuses IT resources on business priorities. If an outage occurs in a service that is a business priority, then you know to move IT
resources from what they’re doing to fix the outage that has the greatest impact on the business.
Interaction with Other Processes
The Incident Management process interfaces as follows with these processes:
Problem Management: Incidents could be symptoms of a problem. For operations to understand the context of a problem and
who is affected, incident management must share data with problem management
Configuration Management: Configuration management provides the logical model of connections and dependences. Incident
management depends heavily on the ability to go through a configuration management system to determine where to start
Change Management: After an incident has been diagnosed, the corrective action to restore that service could require a
change to the infrastructure
Capacity Management: When necessary, capacity management provides workarounds for the incidents
Availability Management: Incident management information is used to identify how the incident lifecycle can be improved and
to determine availability of the IT services
Service Level Management (SLM): Incident management helps SLM identify measurable responses to disruptions in service
and provides reports for reviewing SLAs
Key Factors to Keep in Mind
As you implement or fine-tune incident management in your organization, pay attention to the following:
Early Detection: Your users should not be your incident detection mechanism, but in many organizations, end users detect
more than 70 percent of issues. Design monitoring or event management systems to detect these issues from an end user’s
Knowledgebase: When you have an accurate source of information about any known error or problem, as well as a
workaround, then you can start resolving issues on the first call, rather than three or four calls later
Integration of incident with Configuration Management: You need to understand how components and services are connected
and their dependencies, as well as the history of infrastructure components, so you can see the records of all the past failures
of that component. IT should work on incidents and problems in the order of their likely cause and impact. Start by ruling out
recent changes.
Service Desk: Hire service desk agents who have good customer service skills, since they will be communicating with
customers reporting incidents
Be sure to identify timescales for incident handling stages based on responding and resolving incidents within the SLAs. This
chapter of the ITIL Service Operation publication also reviews what should be included in the incident models, such as steps
for handling the tickets, responsibilities, escalation procedures, thresholds, and timescales.
This chapter covers how to identify an incident, log it, categorize it, and prioritize it. It provides an example of a simple priority
coding system based on low, medium, and high priorities, with a target resolution time for each priority code. It also includes a
detailed process for determining an escalation procedure and investigating, diagnosing, resolving, and recovering incidents.
After an incident is closed, it’s important to conduct a user satisfaction survey on a percentage of incidents to help provide
insight into how satisfied customers are with their service.
Request Fulfillment
ITIL provides the following definition for the term request fulfillment:
Request Fulfillment: “The process responsible for managing the lifecycle of all service requests.” 10
Some organizations may manage request fulfillment as a particular incident. Although incidents refer to unplanned events,
keep in mind that a service request is something that can and should be planned.
Many organizations implement self-service capabilities for frequently requested IT services or provisioning. By looking at all of
your service desk tickets, you can determine what people request most often, and then you can build processes for the most
common needs. The request fulfillment process helps keep all of those requests out of your incident log. The request
fulfillment process also can help with your cloud computing initiatives, which will rely on the ability of IT to be agile and
respond to user requests.
Business Value
Request fulfillment is valuable to the business because it provides the opportunity to quickly and effectively access standard
Interaction with Other Processes
The request fulfillment process interfaces with the following other processes:
Service Desk / Incident Management: Most requests might be submitted through the service desk. Other requests might at
first be managed through the incident management process.
Release, Asset and Configuration Management: When a request is for a component that can be deployed automatically, the
component must be designed, built, and tested in the release process. After deployment, the component should be added to
the CMS.
ibid. See request fulfillment.
Key Factors to Keep in Mind
Most requests should be based upon your catalog of requestable services. This catalog can be a subset of your current service
catalog. If you do not have a service catalog, the requestable catalog can stand on its own to help you manage service
requests. Your requestable catalog might not include all the requests for services you receive from your users.
This scenario presents a business opportunity or an opportunity to improve the requestable catalog of services.
To measure your request fulfillment process, you first need a general idea of how many requests for service you receive. This
way, you can judge whether you’re receiving more or fewer requests and whether you need to allocate more resources. You’ll
want to know where all of the service requests are in their lifecycle. How many of them are logged, how many are being
worked, and how many are closed? It’s also useful to know the depth of your backlog for different kinds of requests. In
addition, request fulfillment can help change management become more efficient in responding to low-impact changes to the
IT environment.
Request fulfillment needs to be synchronized to change management for low-impact, routine changes and to serve as a tool
for normal changes. This can help eliminate constraints in a change management process that requires all changes to go
through all activities within change management. Change management, after determining what changes are of a low-impact
and routine nature, can then review the changes to make sure that the impact continues to be low. For normal changes, the
request fulfillment system can help standardize the creation of requests for change.
Consider these costs. How much does it cost your enterprise when an employee calls to request a new computer, a telephone,
an Internet connection, or access to an application? One service desk team estimated that it receives 1,800 calls a week from
people submitting or checking on the status of these requests. The cost was about $325,000 a year, and that’s just for one-off
requests. What does it also cost when a new employee comes on board and dozens of requests must be submitted for office
space, furniture, equipment, supplies, training, ID badges, and access to IT services? What about when an employee transfers,
gets promoted, or leaves the company? All these requests are in addition to the usual calls for assistance with hardware or
software issues, password resets, and additional hardware or software.
IT organizations spend time and money performing manual interactions, obtaining approvals, coordinating tasks across
various groups, shuffling paperwork, and dealing with process inconsistencies among the different types of requests.
Throwing people at the problem doesn’t help; it only drives costs up further and adds the potential for miscommunication and
Fortunately, there is a solution to this problem. It lies in streamlining and standardizing work request processes through a
catalog of requestable services and implementing service request management (SRM) software that automates those
processes and lets end users submit and check on requests without calling the service desk. If you have IT service
management applications to address change, release, configuration, incident, problem, and asset management, then an SRM
solution can act as a front end. It would pass requests to the appropriate application and associated workflows through
multiple applications. For example, if a user requests an update for a desktop computer, the SRM solution should pass the
request to the change management application and monitor the progress through the change process, as well as through the
process of updating asset information wherever it resides.
If you have a CMDB, then integrate your SRM solution with it to leverage CMDB data and thus create a CMS. You should also
integrate your SRM software with SLA applications to ensure that service fulfillment occurs within agreed-upon time frames.
The ability to roll up information pertaining to service fulfillments and integrate this capability into costing systems enables
you to calculate total service fulfillment costs and determine if the price charged for a service covers the actual cost. It’s
important that the requestor be able to select standard, approved services from the service catalog.
Problem Management
ITIL provides the following definition for the terms problem and Problem Management:
Problem: “A cause of one or more incidents. The cause is not usually known at the time a problem record is created, and the
problem management process is responsible for further investigation.” 11
Problem Management: “The process responsible for managing the lifecycle of all problems. Problem Management proactively
prevents incidents from happening and minimizes the impact of incidents that cannot be prevented.” 12
Figure 4 illustrates the Problem Management Process.
FIG 4: Problem Management Process Flow
ibid. See request problem.
ibid. See problem management.
Business Value
The value of problem management to the business is to ensure service availability and quality. By applying many of the ITIL
principles of effective problem management, your organization can increase availability and reduce the time spent on
The availability and reliability of services should increase as a result of effective problem management, so that the applications
and services provide continuous value to the business instead of interrupted value. Even minor incidents may have a big
impact on the business if they occur frequently, taking up a significant amount of resources to resolve each occurrence. For
example, the utility company that has found a balance between being proactive and reactive is best poised to address
problems as they develop. The power that customers need to do business is uninterrupted, because the utility is not bogged
down in details and is not overwhelmed by problems and outages.
Understanding who is affected, how they are affected, and what the costs are can help you understand the overall impact of
an outage on the business. You may need business input to determine the true cost and constraints to the business.
Interaction with Other Processes
The problem management process interfaces with the following other processes:
The Financial Management process that is part of the service strategy stage of the service lifecycle
The Availability Management, Capacity Management, and IT Service Continuity Processes that are part of the service design stage of the service lifecycle
The Change Management, Configuration Management, and release and Deployment Management processes that are part of the service transition stage of the service lifecycle
The Service Level Management process that is part of the Continual Service Improvement stage fo the service lifecycle
Key Factors to Keep in Mind
As you implement or fine-tune problem management in your organization, keep in mind the following:
Integrated Tools: Efficient problem and incident management requires tools that give you basic linkages, not only between
incidents and problems, but also between your configuration and change management processes
Integrated Team: Train your staff and put in place an effective process by which your first, second, and third lines of support
see each other as part of a larger team rather than as separate factions.
This chapter of the ITIL Service Operation publication outlines how to log, categorize, and prioritize problems. When
prioritizing, consider the severity of the problem, based on how long it will take to fix a problem, what skills are involved in
fixing it, the extent of the problem, and so on. This chapter also describes problem investigation and diagnostic approaches,
such as a chronological analysis, pain/value analysis, brainstorming, and other approaches.
Problem management encompasses the processes and activities required to determine the root cause of incidents and to
come up with workarounds and proposals to fix those problems in a structured manner. Remember that multiple incidents
may be attributed to one problem. Problem management is both a reactive and a proactive process. In addition to determining the root cause of the incidents, you should determine how you can detect the cause in the future— by understanding
where else these conditions exist— so that you can fix the conditions before an issue occurs.
Problem management plays an important role in supporting incident management and change management. Problem
management reacts to user incidents and is proactive by solving problems before users experience any incidents. The data
that is collected in the known error database should support incident management. Problem management also supports the
service transition stage of the lifecycle by submitting requests for changes to change management. This is done after
determining the root cause related to failures or probable failures within the IT infrastructure for services that the business
has promised to deliver to its customers. A successful problem management initiative will help other processes by taking a
more holistic view of probable errors in the IT infrastructure for the delivery of services.
You’ll need to manage the information associated with problem management. The CMS contains all of the information about
the relationships between different infrastructure components. It is the authoritative resource for problem managers when
they are determining the root causes of incidents— running down hypotheses of the causal factors, understanding the
relationships between various incidents, and trying to correlate whether the incidents are related to a single problem.
This chapter reviews metrics to consider when evaluating problem management in your organization. Look at the total
number of problems caused by failed changes. Also look at the total number of problems caused by unauthorized changes and
the total amount of unplanned work generated by problems.
How important is application problem management? According to one industry study, software defects cost the U.S.
economy about $60 billion annually, primarily because of the impact on business availability and performance. This figure
includes the labor cost to find and fix those problems. The study also reports that 80 percent of the time spent managing an
application lifecycle is spent fixing defects. To reduce these costs, strive for proactive and timely problem detection, coupled
with automated problem isolation. This approach calls for the right combination of software and processes.
Access Management
ITIL provides the following definition:
Access Management: “The process responsible for allowing users to make use of IT services, data, or other assets. Access
management helps to protect the confidentiality, integrity, and availability of assets by ensuring that only authorized users are
able to access or modify them. Access management implements the policies of information security management and is
sometimes referred to as rights management or identity management.” 13
Business Value
Access management delivers business value by controlling access to services, making sure that employees have the right level
of access to do their jobs, providing the ability to audit services, and making it easy to revoke rights when necessary. This can
be particularly important when dealing with regulatory compliance issues.
ibid. See access management.
A Successful Access Management Program
Access management is a preventive control; it’s like the utility company setting up systems so that customers don’t
lose power, even when problems are occurring at the source. It’s valuable because if you can’t control which services
your employees can access, it’s very difficult to attest that you have a system of controls around key financial systems
and reporting systems. Access management helps enforce separation of duties. Inappropriate access rights might keep
employees from having access to services they need to actually do their jobs, resulting in lost productivity. Too much access
may result in compliance issues and/or unauthorized changes.
Interaction with Other Processes
The Access Management process interfaces with the following other processes:
Human Resources: Human resources can help confirm users’ identities and whether they have permission to access the requested services
Information Security Management: The security and data protection policies and tools needed for access management will come from information security management
Change Management: A request for access to a service is a change and should be routed through a change
management process
Service Level Management: The agreements for access to services are maintained by the service level management process.
The agreements include information such as the criteria for access, the cost ofaccess, and the level of access granted
to users
Configuration Management: Access details can be stored in and retrieved from the CMS
Key Factors to Keep in Mind
As you implement or fine-tune your ability to verify identity as part of access management, keep in mind the following:
Repeatable Process: You need a consistent, repeatable process to verify that users have the right to get the access they’re
requesting. The more formal and structured the process, the fewer phone calls and activities required to resolve
the request
Access management should be built into HR and finance processes. New hires, terminations, and promotions are all triggers
for access management. Job and role descriptions should include information about application access. Then, when new
employees begin work, for example, their role descriptions tell IT the application access rights that employees should be
granted. Information security is responsible for developing policies with HR, so that access is monitored and the policies are
protected. Finance processes should be included to ensure that approvals based on access and costs are covered.
All access changes should be tied to a request for change. Some requests might be preapproved, and some might require
management approval and routing. One approach is to use a service request for new access and change requests for
modifications to existing access, because changing somebody’s rights can affect services in systems.
Several metrics are important for effective access management. Look at a broad number of access requests. How many are
new, and how many are modifications or requests for change? How much access is being granted and for what purpose? Look
at the roles of the people requesting access. How many of those people are super users or have administrative access? Those
are usually low-volume requests, so a sharp increase should trigger an audit to understand the increase. Knowing the number
of password resets is also useful. Requests for password resets are generally the most common call to the service desk, so
you can achieve significant savings by automating this process.
Identity management systems manage and maintain a wealth of user identity information of considerable value. Implementing
identity management solutions with a BSM approach greatly increases the capabilities and value of BSM solutions in that
environment, as well as those of the identity management solution. A BSM approach, as discussed earlier in this book, makes it
possible to manage IT based on business priorities. BSM focuses on automating the management of technology and IT process
workflow to deliver greater business value.
To enable IT to realize the most value from identity data, the identity management solution should be integrated with the
other solutions in the environment. This integration should be provided out of the box to facilitate fast implementation and
achieve the benefits more rapidly. The CMDB provides the foundation for integration between identity management and
other BSM solutions.
The CMDB is a fundamental component of the ITIL framework. It serves as a single source of reference, facilitating integration
and synchronization among ITIL management processes, all of which contribute and consume CMDB information. An
important requirement is that the CMDB be capable of automatically discovering core identity information. This includes the
discovery of all users who have access to systems and applications. It also includes the association of users with their business
profiles, such as their departments, business functions, and contact information. IT teams can leverage this data to make
more informed business decisions and prioritize actions.
The Relationship Between Service Operation Activities and Other Processes and Lifecycle Stages
The ITIL Service Operation publication discusses the operational activities of processes that are covered in other lifecycle
stages, such as change management, configuration management, release and deployment management, capacity
management, availability management, knowledge management, financial management for IT services, and IT service
continuity management. It also discusses how the processes work across the lifecycle stages. The key point is that the
processes span the lifecycle stages and that service operation does not operate in a vacuum.
Pay special attention to the activities described in the capacity management section of Chapter 5 in the ITIL Service Operation
publication. It identifies components and elements that must be monitored, such as CPU utilization, input/output (I/O) rates,
queue length, file storage utilization, database, applications, and the type of monitoring required. This section also offers
guidelines for handling capacity- or performancerelated incidents, managing workloads, storing data, modeling, and capacity
planning. IT operations staff should participate in IT budgeting and accounting and be involved in regular reviews of
expenditures against budgets. The staff should participate in risk assessment, execution of risk management, testing,
developing recovery plans, service desk communication during disasters, and a variety of other areas that impact IT service
Service Operation is crucial to realizing value related to the other processes in the service strategy, service design, service
transition, and continual service improvement stages. Within service operation, the user perspective can be obtained through
metrics gathered at the service desk and through incident management, request fulfillment management, and problem
management to help with the continual service improvement process activities. Service operation can provide feedback
across the lifecycle to help with service level improvement or service adjustments.
Service operation can help the business understand opportunities for growth or service differentiation within the market
spaces that are being addressed, based on the user-facing metrics. Service operation works very closely with service
transition for all services before and after the services are operational. Without service operation, IT cannot recover the cost
of providing and supporting the service.
This Chapter discusses operational activities to help ensure that technology is aligned with service and process objectives. As
IT organizations become more mature, they will be more inclined to move their focus from managing their systems or servers
to working closely with other technology groups to achieve greater business value. Figure 5 shows the steps involved in
maturing from a technology organization to one that uses technology to achieve strategic business objectives.
The IT operations organization performs several critical activities. One such activity is monitoring and controlling the
infrastructure. Service management tools are useful in recording and monitoring the configuration items (CIs) and
operational activities— such as work scheduling, workload management, and workflow — to make sure that everything that’s
expected to happen is happening, both from a performance standpoint and an execution standpoint. If any exceptions are
found, the group can react accordingly.
FIG 5: Achieving Maturity in Technology Management
Using a test environment for monitoring can help you determine whether your proposed change will deliver the anticipated
results, especially when the incident or problem is defined around a monitoring or threshold problem. If you’re trying to fix a
performance problem, use a test environment where you can re-create the situation and see if it improves. For example,
“black box” technology can collect data and play it back so that you can understand where a point of failure occurred and fix
the problem.
You can use reporting for service operation audits to ensure that work packages — a subset of a project that can be assigned
to a specific party for execution— are performed every day and within the allotted time, for example, and that critical
processes are performing within their ideal range. You can also audit access to applications, systems, and data, which is
important for compliance with regulatory requirements.
Operational monitoring gives the continual service improvement stage the granular data it needs to quantitatively determine
the quality of the services that are being delivered. Reports can show you that certain services are underperforming and
affecting the quality of the overall service output.
In addition to monitoring and control, other critical activities that must occur inside an IT operations organization
include the following:
Operations Bridge: Also called the IT ops console or the network operation center, the operations bridge brings together the
management of events, incidents, and problem information with the management of day-to-day IT operations tasks. In this
centralized area, systems events are correlated, and then a person or an automated process decides what to do with the
Enterprise Workload Management / Job Scheduling: The operations staff is responsible for batch jobs or for particular
configurations for certain business activities.
Backup and restore: IT operations performs the actual backup or copying of data to protect it. During the service strategy and
service design stages of the service lifecycle, organizations should determine what data will be backed up and where it will be
put physically
Print and Output Management: It’s important to clearly define who is responsible for maintaining printers and storage
devices, as well as dealing with centralized bulk printing requirements.
Mainframe and Server Management: These teams are responsible for maintaining all hardware and software for the
mainframes and servers.
Network Management: Network Management has the overall responsibility for both the local networks and all of the
connections in between, whether you’re in a campus environment, a city environment or metropolitan area network, or
spread around the world.
Storage and Archive: One group is responsible for the subsystems and the information held on storage and archiving systems.
Also consider integrating these people with your backup team. If they will be responsible for the information and where it is
stored, they’re the best people to make sure that the information is backed up properly.
Database Administration: Database administrators help scale, design, and create databases. This group is responsible for the
underlying databases of major applications, such as financials and enterprise resource planning (ERP).
Directory Services Management: Directory services management is important because it concerns user provisioning and
access management. You should create a directory of all of the resources that people need to provide to access services, as
well as a directory of all the users.
Desktop and Mobile Device Support: Users access services from many devices, such as desktops, laptops, and mobile devices.
The desktop and mobile support team should be responsible for all of the organization’s devices and have policies and
procedures related to a mobile workforce, personal use, and so on.
Middleware Management: Middleware systems take the form of enterprise application integration and transfer data back and
forth between applications that are extremely important to the business.
Internet and Web Management: Internet and Web management activities include building the delivery architecture for your
corporate Website and Internet services, and building standards and guidelines for how your company will develop, design,
and test Websites.
Facilities and Data Center Management: This group manages the physical infrastructure that provides basic services to IT.
Operational Activities of processes covered in other lifecycle stages: This section outlines how the service operation staff is
involved with change management, service asset and configuration management, release and deployment management,
capacity management, demand management, availability management, knowledge management, IT service continuity
management, information security management, and service level management.
Improvement of Operational Activities: The service operation staff should focus on process improvements to improve service
delivery, which includes some of the following activities: automating manual tasks, reviewing makeshift activities or
procedures, performing operation audits, using incident and problem management, communicating, and engaging in
education and training.
It is the responsibility of all IT operations staff to look for areas in which to implement process improvements. This involves
automation of manual tasks, especially those that must be regularly repeated and can be time consuming and error prone. For
example, if email performance degrades below a predetermined level, the solution should respond with predefined steps to
identify the source of the problem. If it is a known or common condition, the problem should be resolved automatically,
restoring performance to acceptable thresholds. If it can’t be resolved, an alert should go to operations management to
provide them with all the analysis and attempted resolution steps. With this technology, operations will experience less chaos
and reactive behavior and increase customer satisfaction.
Defining the scope of what IT operations monitors should be a function of identifying what’s most critical to the business and
the desired outcomes. Then you should determine how to effectively build controls around those outcomes so that you can
be alerted if something is not in the performance range that you want. It’s important to include people outside of IT in those
measures so that the thresholds are relevant to the user’s actual experience. Get input from the stakeholders about what’s
really important to them. You should also focus on other critical activities, such as job scheduling, backup and restoration,
database administration, and a variety of other key areas highlighted in this chapter of the book.
This chapter in the ITIL Service Operation publication defines the concept of functions and then discusses different roles in
the IT operations organization. These roles include the following critical functions, which are needed to keep IT running:
Service Desk
Technical Management
IT Operations Management: IT Operations Control, Facilities Management
Application Management
The various organization structures are referenced in Figure 6 and are described in detail in the ITIL Service Operation
publication. There is also a special note on outsourcing, with the following key points to consider:
The company that contracts the external service provider must ensure the service meets standards
Be sure to have service management practices in place before outsourcing
Ensure there is active involvement by both organizations
The external service provider shouldn’t decide the outputs or how to measure them
Ownership of data must stay within the organization that is outsourcing the activity, even though both organizations may require access to certain data.
FIG 6: Organizational Structures
Service Desk
The service desk is important because it is a pivot point for the IT organization, handling incidents and requests and
performing analyses. In addition, it generally is the most visible part of IT to the business. The ITIL Service Operation
publication discusses several strategies for setting up your service desk. For example, the service desk should give the
business one place to call for all issues, and your service desk agents should determine how to route the call. The location of
the service desk may vary, depending on the geography, number of users, extent of services, and other factors. It can be a
local service desk located close to the user community; centralized, where fewer staff deal with a higher volume; or virtual.
However, no matter where your service desk is located, there should be no doubt about how your users can contact the
service desk.
The truly evolved service desk can take IT to the next level by giving users a more personalized service. One particularly
effective step is to provide an app that gives end users a set of information and services tailored to their role and
circumstances. For further impact, this front-end interface to ITSM can be underpinned with process automation, thereby
increasing efficiency in the back-end provisioning of requests. This app should operate on mobile devices and include selfservice and social media capabilities. Your employees should have easy access to the services they need— anytime, anywhere,
and from any device.
It’s important to split the requests from the incidents to ensure the service desk can handle all incidents appropriately without
being bogged down by standard, repeatable requests. The service catalog and Web-enabled access allows the users to make
standard, defined, and repeatable requests without disrupting the incident management process.
Refer to the ITIL Service Operation publication for a detailed discussion of service desk staffing, required skill levels, and
training. The following list provides guidance on decisions related to staffing levels:
Customer expectations
Types of response required
Business requirements
Complexity of IT Infrastructure
Support Technologies
Number of Customers and Users who are supported
Staff skill sets
Types of incidents and service requests
Tips on Managing the Service Desk
Like a utility company that ensures reliable delivery of power to the largest users during peak hours, it’s important to work
closely with the service desk team to establish when to put resources on a call and to identify the types of resources to use.
Look at the activities of the business and the business cycles together with the SLAs, and use that information as a starting
point. Also look at reports to see when you are getting the most calls. This will help you focus your efforts in the right areas.
The concept of engaging super users is a very powerful way to reduce the number of incidents at your service desk and to
free up budget resources. Super users are business users who have a firm grasp of the technology they use to do their jobs.
They may be located throughout the user community, and sometimes you can find them already within an organization.
Whenever people are having application issues or issues in areas where the super user has expertise, they first go to the
super user. However, the super user will escalate an issue if he or she doesn’t know what to do.
Be careful in establishing service desk metrics, because you can end up encouraging the wrong behaviors.
Ask these questions:
How many issues and incidents can you resolve, and how much time does it take to resolve them?
How can you prevent issues and incidents from happening?
How can you empower end users to solve problems on their own?
How often can the service desk agent resolve the issue on the first call?
What percentage of calls is routed to second- and third-level support?
What is the average cost of handling an incident? (For planning purposes, this can be determined by figuring out the total cost of the service desk and then dividing it by the number of calls. See the ITIL Service Operation publication
for more detailed information about calculating metrics.)
It’s one thing to look at service desk metrics. It’s another to find out from a customer’s perspective how the service desk is
doing and whether the service desk met the customer’s needs. Conducting an electronic survey is useful. Also, be sure to
avoid mixing service desk incident management metrics with event management metrics when determining the
effectiveness of a service desk. Event management can open and close incidents; don’t confuse this capability with users
opening and closing incidents. Balanced scorecards are also a good way to obtain an overall view of how the customers
perceive the service provided, as they include both tangible and intangible areas for review. For example, courtesy and
attention to detail are intangible areas that could appear on the scorecard.
Some organizations must decide whether to outsource a portion of the service desk. This chapter in the ITIL Service
Operation publication provides guidance regarding what the outsourced service should be able to access, such as incident
records, problem records, known error data, change schedule, SKMS, CMS, and alerts. It also reviews recommendations
for improving communications with the outsourced desk personnel.
Many companies offshore some of their functions, including portions of their service desk. With the right technology,
processes, and communication, it is possible to make the service seamless. For example, a technician at a support center in
India can remotely fix a problem in real time on a user’s computer in the United States.
Consider a variety of survey tools and techniques for measuring service desk performance. These include after-call
surveys, outbound telephone surveys, online surveys, interviews, and more. Customer satisfaction surveys help to analyze
the “soft” measures, such as how well the users say their calls have been answered.
Technical Management
The objective of technical management is to help plan, implement projects, and maintain stable operations. If an issue occurs,
technical management needs to allocate the right resources to the problem to resolve it as quickly as possible. Refer to the
ITIL Service Operation publication for a detailed discussion of technical management activities and documentation
requirements. Some of the general activities include identifying the expertise required to manage and deliver services,
documenting skills, initiating training programs, recruiting contractors, researching and developing solutions, becoming
involved with the design of new services, and performing tests of services.
What Should Technical Management Measure?
Services don’t provide any value unless they are running, just as a utility company provides no value if it fails to deliver
electricity to customers. In that case, the consumption meter stops. What is the availability of the infrastructure and the
services you provide? Typical metrics relate to measuring the agreed outputs, such as transaction rates and availability for
critical business transactions. They include process metrics, such as response time to events and event completion rates,
problem resolution statistics, expenditure against budget, and so on. They also look at technology performance, such as
utilization rates, and availability of systems, networks, and devices. And it’s important to measure mean time between failures
of specified equipment and to measure maintenance activity. Get metrics on the new areas in which your service desk
personnel have been trained, to ensure that they have received proper training. Also examine statistics for problem resolution.
How many times are calls escalated from the front line of support, and why are they escalated? Looking at the percentage
of failed changes can show you how many times you must retrace your steps and do a rework before you get it right. Finally,
examine the performance of technology components. How well are the infrastructures performing against the design and
the availability requirements?
IT Operations Management
The focus of operations management is to perform and oversee day-to-day operational activities. The team executes all of the
system administration activities. This group makes sure that devices, systems, or processes function as expected; turns plans
into actions; and focuses on daily tactical activities. Operations control is responsible for overseeing activities and events and
performing console management, job scheduling, and a variety of other functions.
The facilities management role manages the physical IT environment — such as the data center, recovery sites, power and
cooling, and so on. The ITIL Service Operation publication provides more details about IT operations management.
Metrics to watch as part of IT operations management include the successful completion of scheduled jobs; that is, what
percentage of your work is planned, and what percentage of planned jobs is achieved in the time allocated? Other metrics
include the following:
Systems restored
Power usage stats
Percentage of scheduled jobs
completed successfully on time
Number of maintenance windows that
have been exceeded
Equipment installation statistics
Process metrics, such as response time
to events, number of security-related
incidents, or expenditures against budget
Cost versus budget related to security, shipping, and so on
Escalations and the reasons for them
Application Management
Application management is responsible for managing the application’s lifecycle — to deliver applications that are thoroughly
tested and that deliver the results expected by the business. Many different organizational elements in IT are involved in
application management, including software development and operations. Application management helps the operations
team figure out what they need to do in their daily, weekly, and monthly activities to support the applications. Regardless of
specific areas of expertise, application management teams design, recommend, collaborate, and assist. One person in
application management should be responsible for interfacing with the business at all of the stages. Refer to the ITIL Service
Operation publication for a detailed discussion of application management activities and documentation requirements.
Keep in mind that application management needs to interface with IT operations, but is not “owned” by IT operations.
Application management assists in design decisions related to build-orbuy and provides information (such as application sizing and
operational costs), investigates the amount of customization required,
and identifies security and administration requirements. See Figure 7
for an overview of the application management lifecycle.
FIG 7: The Lifecycle of Application Management
The ITIL Service Operation publication includes a comprehensive list of application management metrics. For example,
measure agreed outputs, process metrics, application performance, maintenance activity, and training and skills development.
When gathering metrics related to application management, identify the business constraints that an application or service
is meant to remove, and then effectively remove the intended constraints. People often look at whether the application
supports many transactions or is reliable. Those are important metrics, but if your process isn’t delivering, and if your
application management process isn’t enabling the application to eliminate the proper constraints, then the application is
not delivering the intended business value.
A key principle in ITIL service operations is managing stability versus responsiveness. Operations wants stability;
development wants to produce features that are more responsive to customer needs. Business and IT requirements are
constantly changing, requiring agility in producing application functionality while at the same time maintaining IT stability
for application performance. ITIL’s service lifecycle approach helps organizations agree to desired changes, take advantage
of the existing infrastructure, and understand what it takes to deliver the changes for value realization in operations.
IT organizations sometimes need to transform their services and applications quickly to meet customers’ needs or risk
becoming optional and having more services outsourced. Adopting a DevOps approach and ITIL service operation best
practices helps organizations be more responsive to business needs without affecting operational stability.
Establishing Roles and Responsibilities in Service Operation
When you establish roles and responsibilities to support service operation, make sure that you know who owns what. Also
make sure there is a single point of ownership for each functional or process area. For further detail about each role in service
operation and the responsibilities of each role, refer to the ITIL Service Operation publication.
The publication describes activities for the following roles:
Service Desk: Manager, supervisor, analysts, and super users
Technical Management: Managers/team leaders, technical analysts/architects, and technical operator
IT Operations: Manager, shift leaders, operations analysts, and IT operators
Application Management: Applications managers/team leaders, applications analyst/architect
Event Management: Interfaces with various groups — generally no dedicated person is assigned. The roles include process
owners and managers.
Incident Management: Process manager; process owner; first, second and third-line analysts
Request fulfillment: Initial handling of requests will be performed by service desk and incident management staff, but there is
also a request fulfillment process owner, manager, and analyst.
Problem Management: Problem manager, process owner, problem analyst, and problem-solving groups
Access Management: Interacts with other various groups — no dedicated person assigned
Generic Service Owner: Ensures that ongoing service delivery and support meet agreed customer requirements
Generic Process Manager: Accountable for managing a process
Generic Process Owner: Sponsors, designs, and manages the change of a process and metrics
Generic Process Practitioner: Carries out one or more activities of a process
Choosing an Organizational Structure for Service Operation
IT operations usually covers a basic set of services that are fairly easy to organize once you understand your organizational
requirements. Many IT groups are organized around technical responsibilities, and that approach is the most common. The
ITIL Service Operation publication also explains and discusses the advantages and disadvantages of organizing service
operation according to activity, process, and geography, as well as by hybrid organizational structures. Most important is to
choose the structure that is right for your environment. Table 1 compares various organizational structures.
TABLE 1: Organizational Structure Comparison
The type of organizational approach you select depends on your company’s business objectives, environmental constraints,
size, type of business, and unique requirements. You can organize your operation in a variety of ways— such as by geography,
activity, process, and so on. The type of approach you choose depends on your objectives and requirements.
This chapter reviews technology requirements for supporting service operation. Having the right technology is critical,
because achieving success with IT service management is dependent upon automated solutions that support ITIL processes.
To meet your customers’ expectations, you must be able to automate and integrate key processes wherever possible;
otherwise, the processes will be subject to human errors and you will need to deal with mundane, time-consuming, manual
processes. The technology discussed in this chapter covers the generic and specific guidance for event management, incident
management, request fulfillment, problem management, access management, and the service desk.
Generic requirements from the ITIL Service Operation publication include the following capabilities:
Remote Control
Workflow or Process Control Engine
Diagnostic utilities X
Software as a service
(SaaS) technologies
Integration with BSM
Integrated CMS
Licensing Technology
Remote Control
Diagnostic utilities
In addition, there are some specific technology requirements:
Event Management
Incident Management: Integrated IT Service Management (ITSM)
Technology, Workflow and Automated Escalation
Request Fulfillment
Problem Management: Integrated Service Management Technology,
Change Management, Integrated CMS, Known Error Database
Access Management
Service Management
Service Desk: Telephony, Support Tools
IT Service Continuity Planning
A variety of important considerations will influence the solutions to support service operation activities. It all starts with
having a vision for BSM, because service operation is where customers see the business value of IT. Just as utilities need to
manage several layers of service — such as engineering and logistical planning, hardware management, and continuous
delivery— having the proper tools in place to support service operation is vital to helping you manage more efficiently.
Generic Technology Considerations
The ITIL Service Operation publication describes several generic technology considerations that are critical to achieving
success in the service operation stage of the service lifecycle.
ITIL mentions how organizations find it beneficial to offer self-help capabilities to their users. The technology should utilize a
Web front end that helps the Web pages to offer a menu-driven range of self-help and service request selections. Self-help is
also a key capability for cloud computing services and end-user mobility.
To improve IT service efficiency, IT organizations should enable end users to directly request new services and track the
status of their requests. However, end users often don’t have visibility into what IT has to offer. This problem can be
addressed by utilizing a service request management solution. The solution should allow you to improve self-help by
defining offered services, publishing those services in a Web-based catalog, and automating the fulfillment for the end
users. By enabling end users to help themselves, the solution can reduce the flood of requests the service desk receives.
Look for a solution that improves the efficiency of requesting and delivering services to end users through standardization
and automation and provides the ability to deliver self-help. The solution should be location aware, know who the user is,
and anticipate which applications and services the user needs. Think of the solution as a virtual assistant.
Many standard requests, such as changes, often originate from within IT. To further improve IT service efficiency, a service
request management solution should allow IT to define a unified and simple front end for change requests from both end
users and other IT employees.
Don’t underestimate the power of social media and its ability to provide self-help. Services, processes, and systems aligned
to ITIL best practices can be enriched and enhanced through the inclusion of social media. For example, service outages
can be broadcast across the enterprise using a social media tool, and unnecessary escalations to the service desk can be
avoided. Similarly, groups of users who consume a specific technology service or type of technology can communicate and
share tips and strategies for tackling minor issues or optimizing performance.
Social media also facilitates cross-team and process collaboration and validation. The problem management team could
use a social media tool, for example, to investigate the impact of a problem and explore what related activities and changes
may have contributed to the issue at hand.
Workflow or Process Control Engine
A workflow or process control engine enables the control of processes in the lifecycle, such as those for problem management, request fulfillment, incidents, changes, and so on. The engine should enable you to predefine activities, escalation paths,
and other functions so that they can be automatically managed.
It’s important to bridge the various technological and procedural silos across the IT service management spectrum into a
fully orchestrated engine, from which both comprehensive automation and proper management controls are attainable.
Results can be realized only through incremental steps over time, progressing further and further away from a fragmented,
manual orientation toward a fully synchronized automated operations capability.
Look for solutions that bridge people, process, and technology across IT operations. The solution enables IT operations
organizations to automate routine, labor-intensive, error-prone tasks by leveraging systems, applications, and tools across
silos in the operations environment.
Integrated SKMS/CMS/CMDB
ITIL recommends using an integrated CMS to allow IT infrastructure assets, components, services, and configuration items to
be in a centralized location. This allows relationships between these elements to be stored and maintained. In addition, an
integrated CMS includes the CMDB and should be linked to records of incidents, problems, known errors, and changes where
applicable. The SKMS uses the CMS and CMDB information for stakeholder decision support.
Look for a solution that offers required elements of an SKMS to help you make decisions based on the impact to critical
business services. They should provide immediate support for best-practice IT processes, automated technology
management, and a shared view of how IT supports business priorities.
FIG 8: Sample Service Knowledge Management System (SKMS)
Discovery/Deployment/Licensing Technology
Audit tools are important for populating or verifying CMS data, as well as for assisting in license management and discovery.
You should be able to run the tools from any location on the network. The technology should facilitate filtering, so that only
required data is extracted. Use the same technology to deploy new software to special locations. This will allow patches, for
example, to be sent to the correct users.
Look for a solution that lets you automate the population and maintenance of configuration and relationship data, allowing
you to discover information to manage identities, assets, requests, and so on. The solution should also enable IT to be more
efficient by providing a richer set of shared information in support of integrated processes. By standardizing, reconciling,
and normalizing configuration changes across data sources, you should be able to reduce the chance of changes disrupting
the business.
Remote Control
It’s important for service desk analysts and other support personnel to be able to access a user’s desktop remotely to
troubleshoot and solve computer-related issues under properly controlled security conditions.
Diagnostic Utilities
The ability to create and use diagnostic scripts and other diagnostic functions will assist the service desk with earlier
diagnoses of incidents. The scripts should be context sensitive and automated.
By understanding incidents and problems occurring in the IT environment, and by correlating those issues with both
business services and the underlying infrastructure supporting them, you can analyze trends and determine how to make
better decisions to improve overall service quality.
Dashboards are valuable because they allow you to see at a glance the overall IT service performance and availability levels.
It’s useful to have customized views of information to meet specific levels of interest. Customers and users are interested in a
service view of the infrastructure — a technical view is generally not as relevant to them.
Providing proper management visibility into key IT performance indicators can help you run and maintain an effective IT
organization that consistently meets the demands and needs of the business. Unfortunately, these indicators and metrics
are often scattered across various IT management tools and applications, making it difficult to gain management-level
visibility into overall performance within an IT organization.
So, how do IT executives and managers get visibility into how the organization is performing, including where it needs work
and where it excels? BSM dashboards, for example, address this challenge by providing interactive access to key service
support metrics to help IT management make decisions based on business requirements and accelerate the alignment of IT
with business goals. Look for a solution that has a consolidated, graphical interface of best-practice IT metrics for incident,
problem, change, and service impact management. With this insight, IT managers can get the right data at the right time to
improve the success of their IT support functions.
ITIL states that it should be easy to retrieve data when you need it. The technology you select should include good reporting
capabilities and standard interfaces for inputting data into industry-standard reporting packages and dashboards.
Due to a lack of actionable reporting capabilities, organizations often face difficulties in making the right operational,
financial, and contractual decisions that support service management. Traditional reporting often relies on static reports
that provide important information overviews but lack functionality to allow users to drill down into a deeper level
of analysis.
Look for a solution that provides reporting capabilities that enable point-and-click analysis and reporting across business
service configurations, linking incident and problem data from the service desk with configuration and relationship data
(from the CMDB). The solution should also link contract, software license, lease, and warranty information. By combining
this process data into a consolidated view, the solution should allow you to analyze service desk performance, along with
supporting IT configuration and assets, to determine how effectively you are supporting your critical business services.
SaaS Technologies
ITIL reviews the importance of managing capital expenditures on hardware, software, and people resources. SaaS, available
over the Internet from a private or public service provider, can lower startup costs and capital costs. Also, implementation can
be faster to provide immediate value to the business.
Care must be given to the following constraints:
Level of Customization that might be required
Upgrade Capability
Warranty considerations, including availability
(hours of service), capacity to support the business, and security and access
Integration with other Service Management tools
Operations should also care about platform as a service (PaaS), infrastructure as a service (IaaS), and other service delivery
models that can be enabled with cloud computing technology.
Integration with BSM
ITIL reviews the importance of bringing together business-related IT with the processes and disciplines of service
management. To facilitate BSM, the business solutions should be integrated with the IT service management support tools.
The ITIL Service Operation publication cites the example of how a telecom company connected its telephone cell-net
monitoring and billing to its event management, incident management, and configuration management processes. As a result,
it could detect unusual usage and billing patterns. This information was interpreted to identify which phones had been stolen
and were being used to make illegal calls. The company could raise incidents for these patterns, automate actions to suspend
the use of these phones, identify the location of the user, and provide information to the police.
By following a BSM approach, you can reduce cost, lower the risk of business disruption, and benefit from an IT infrastructure
built to support business growth, innovation, and flexibility. It’s important to look for a vendor that provides all three
dimensions of BSM: best-practice IT processes that support ITIL, automated technology management, and a shared view of
how IT services support business priorities. The vendor should have the depth, breadth, and experience to ensure your
success and provide technology that supports the increasing demand for the consumerization of IT services. The technology
you choose should work with your existing IT investments, so that you don’t need to get rid of your existing technology to
support new solutions.
Specific Requirements
ITIL describes several processes that are specific to service operation, including event management, incident management,
request fulfillment, problem management, access management, service desk, and IT service continuity planning.
Event Management
ITIL reviews a variety of features that should be included in event management technology. For example, it should be easy to
deploy; include standard agents to monitor the most common environments, components, and systems; and have open
interfaces to accept standard event outputs and alerts. The technology should allow an operator to acknowledge an alert and
have it escalated when a response doesn’t occur within a defined period of time.
Determining which IT events are creating the greatest impact on the business is a common challenge. Because many
enterprises have acquired a large number of monitoring tools that often do not integrate with one another, it can be even
more challenging to identify which events are related to the source of the problem and which are just “symptoms.”
Look for an event management solution that allows you to detect IT problems so you can concentrate on resolving issues
quickly — before there is an impact on critical IT services. With built-in interoperability with third-party performance
management solutions, the solution should be able to handle events from a broad set of sources (including mainframes,
distributed systems, networks, databases, and applications). It should translate them into information that enables IT to
resolve incidents and problems faster by filtering, prioritizing, enriching, correlating, and automatically handling events
according to business and operational priorities. With a highly scalable architecture, it should manage very high quantities
of events and automate corrective actions, in addition to integrating with other technologies, such as performance
management or your help desk.
Incident Management
According to the ITIL Service Operation publication, the incident management process should incorporate integrated IT
service management technology, as well as workflow control and automated escalation activities.
Integrated IT Service Management Technology
ITIL provides a comprehensive list of functions that are required for integrated IT service management technology. For
example, the IT service management technology should include a CMS that allows automated relationships to be maintained
between the incidents, service requests, known errors, the CMDB, and other CIs. The SKMS/CMS can also be used to assist in
determining priorities or in investigating and performing a diagnosis for decision making. The technology should include a
process flow engine that allows processes to be predefined and automatically controlled. Automated alternating and
escalation capabilities are needed to prevent incidents from being overlooked. Easy-to-use reporting facilities are also
important for allowing incident metrics to be created.
Look for a solution that enables IT to make informed decisions and to respond quickly and efficiently to conditions that
disrupt critical services by automating ITIL incident and problem management processes, so that your service desk can act
as a single point of contact for user requests, user-submitted incidents, and IT operations. It should provide a flexible,
built-in, best-practice, and self-service knowledgebase to help speed the resolution of end-user issues and the identification
of defects in the IT infrastructure. Automated workflow should be able to capture and track incidents and problems in an
integrated fashion, from the initiation of the incident to problem correlation, through knowledge entry creation and
change request and verification, and finally, to a permanent fix and resolution.
An evolved solution should include an intuitive interface with a consistent look and feel across multiple end-user devices, as
well as the infrastructure behind it. The interface should consolidate many information sources and provide a single access
point to execute the service interactions that matter most to IT service users. It should take the concept of apps and
services — whether related to availability, accessibility, or performance — and give users the ability to access and view the
status of them, as well as interact with apps and services and report issues. This approach should include a rich set of
information that can help IT tailor services to individuals when it becomes available. This is like having an IT assistant that’s
with the users wherever they go.
Workflow and Automated Escalation
Support tools should let you set target times, which can be used to automate workflow control and escalation activities. For
example, if a support group has not resolved an incident within the time allotted, the incident would be automatically routed
to the next-level support group, alerting the service manager.
Process challenges can be addressed with integrated solutions that enable IT organizations to drive dramatic improvement
in operational efficiency and flexibility while ensuring compliance, service availability, and responsiveness to changing
business needs. Through the automation and orchestration of the interactions between people, process, and technology,
the service automation solutions enable a reduction in IT costs and accelerate the delivery of new IT services.
Request Fulfillment
Follow an integrated approach so that service requests are linked to incidents or events. Self-help capabilities allow users to
submit requests via the Web, using a menu-driven selection process.
Request fulfillment technology, also known as service request management technology, is evolving, giving employees the
luxury of one-stop, online shopping for all the services they need — including cloud computing services. It also gives the
service providers in your organization a single place to advertise their services to employees. It’s like having a service
supermarket at your employees’ fingertips. The business benefits are significant. Look for a solution that employs
standard, repeatable, best-practice processes for handling requests; this reduces business risk and gives management
greater insight into service delivery quality and costs. Employee productivity rises because people can find the services
they need, when they need them, from the approved, published, and requestable service catalog. With this approach,
services are delivered quickly, effectively, and at a lower cost. Employees can initiate and track service requests on their
own, reducing the load on the service desk. And, finally, all service requests can be tracked for later auditing as needed
for regulatory compliance.
Problem Management
Specific guidance for problem management includes automation for these key processes: integrated service management
technology, change management, an integrated CMS, and a known error database.
Integrated Service Management Technology
ITIL recommends an IT service management solution that differentiates between incidents and problems. This differentiation
makes it possible to have separate problem records raised to deal with the causes of incidents and to link this information to
related incidents.
IT organizations are under pressure to supply higher levels of support to the business in the form of faster incident
resolution and improved service levels. However, many find this to be a significant challenge because incident management
processes are not standardized and incidents are not prioritized with an understanding of their impact on the business.
Look for a solution that enables IT to respond quickly and efficiently to conditions that disrupt critical services, by
automating ITIL incident and problem management processes, so that your service desk can act as a single point of contact
for user requests, user-submitted incidents, and IT operations. The solution should include best practices and a self-service
knowledgebase to help speed the resolution of end-user issues and the identification of defects in the IT infrastructure.
Automated workflow should capture and track incidents and problems in a seamless and integrated fashion, from the
initiation of the incident to problem correlation, through knowledge entry creation, change request, verification, and,
finally, to permanent fix and resolution.
Change Management
Problem management should integrate with change management so that records for requests, events, incidents, and
problems are related to the requests for change that have caused these problems.
Look for a solution that provides change process management to improve your ability to quickly implement IT changes,
enforces policies to minimize business risk, and automates your change management process with built-in ITIL best practices.
The solution should deliver comprehensive policy, process management, and planning capabilities that help you increase the
speed and consistency with which you implement changes while minimizing risk and errors. The solution should be able to
automatically track a request for change from request initiation to planning, implementation, and verification.
Integrated CMS
An integrated CMS should enable problem records to be linked to the components and services that are impacted as well as to
CIs that relate to these records. Configuration management is part of a larger SKMS that links to data repositories.
An effective problem management solution should provide visibility into the status of approvals, change execution, and
change execution conflicts. The solution should be integrated with the CMS to ensure a unified view of configuration
changes across IT processes.
Known Error Database
A known error database is used to store knowledge of incidents and problems — and how they were overcome— to allow
quicker diagnosis and resolution if they recur. The solution should allow the data to be incorporated automatically without
needing to rekey it.
Access Management
Access management uses a variety of technologies to identify and track user information. For example, this includes
change management systems, request fulfillment technology, directory services technology, human resources
management technology, and a variety of access management features in various systems and applications.
Most companies face the challenge of providing secure, efficient, and cost-effective user access to services that cross
divergent applications, hardware platforms, operating systems, and locations. Users who need access include internal
employees (on both sides of the firewall), business partners, and customers. Further increasing this complexity is the need
for users to access applications and data that reside on different platforms and domains — and often follow autonomous
security models.
More and more, people are bringing their own devices to work. It makes sense to embrace the trend and make it easy for
users. However, you need to make sure you aren’t making it easy for critical data to go out the door, as well as balance
accessibility with productivity. So, you must manage personal devices with the same efforts as other corporate-owned
devices. Integrating the management of these devices into your service desk solution enables IT to better support personal
devices without giving up control or driving up costs.
Your access management solution should secure internal and external user access by establishing automated controls
based on roles and business rules. It should offer flexible access control to manage large global or dispersed user
populations. The solution should facilitate passing the access portion of an audit. It should also segment access by user role
and business rules and provide access to strategic partners, customers, and suppliers. In addition, the solution should have
an open architecture that integrates with other applications and systems, even those provided by multiple vendors.
Service Desk
The service desk needs a variety of tools to work efficiently, including telephony and support tools.
Support Tools
A fully integrated IT service management tool set is required for organizations to seriously implement IT service
management. This tool set should have a CMS at the center and provide integrated support for all of the processes defined
within ITIL. Support tools should include a known error database, diagnostic scripts, and a self-help Web interface, for
The solution for your service desk should help you align your service quality and costs with business needs by automating,
integrating, and optimizing ITIL service support processes across IT through an open, unified architecture oriented around
your business services. Look for a service desk solution that helps improve staff productivity and support consistency by
automating standardized processes, policies, routine tasks, and customer self-service. With this technology, you should be
able to demonstrate the value of service management and continually improve processes with business service metrics and
the key performance indicators that span functional areas.
IT Service Continuity Planning
Organizations will become dependent upon their ITSM tools. Before purchasing a solution, you should perform a business
impact analysis and risk analysis to make sure your organization can meet the expected service continuity and resilience
IT operations groups should leverage technology to help them manage IT from a business perspective. This chapter of the
ITIL Service Operation publication describes what to look for in solutions that can help you run operations more effectively
and embrace emerging requirements driven by technology innovations. Automation will help you to respond faster and
more precisely to infrastructure issues. This will enable you to get greater control over your environment while reducing
complexity and risk. Technology can help you get better visibility into issues that impact services so that you can make
more effective decisions and increase your value to the business.
This brief chapter of the ITIL Service Operation publication provides general guidance for implementing service operation.
One of the primary points in Chapter 8 is to make sure that changes do not affect the stability of your services. This
chapter identifies factors that trigger changes, such as new or upgraded hardware components, systems software, and
applications, as well as compliance changes. The chapter also focuses on the value of project management, considerations
for assessing risks, guidance related to operational staff in service design and transition, and planning and implementing
service management technologies.
Key challenges in deploying technology involve making sure that changes do not have an adverse effect on your service and
that you release the technology at the right time — don’t introduce a tool before the organization is prepared to use it. If
you deploy tools too quickly, you run the risk of not understanding what you need or of disrupting the business with tools
the organization isn’t prepared to use.
Roll out processes and supporting tools at the same time. Tools without processes won’t be used consistently. Processes
without tools cannot be enforced or measured. Don’t try to reinvent the wheel. Good tools and best practices will expedite
the process and make it cheaper, faster, and easier. Also remember to train your people on the changes and their value to
the business.
Most business outages are due to unplanned changes. The technology you deploy should help make changes “painless.”
Fortunately, you can manage change effectively — in a way that enables change to help, rather than hurt, your business. An
integrated solution for change management should combine ITIL best practices with a CMDB, which has both intelligent
and predictive technologies to help you to thrive in a rapidly changing, IT-driven business environment.
Follow the guidance in the ITIL Service Transition publication for all implementations to ensure the services adhere to the
lifecycle approach, meet the strategy, and are designed appropriately. Remember the early life support process to ensure
that the change management team, development team, and operations team are all involved during the early days of live
operation of the new or improved service. This approach helps to make sure that improvements, enhancements, and the
eradication of errors can be handled quickly and effectively, with minimum disruption to customers. It also provides
feedback for all future changes.
Implementing service operation involves effectively managing change. When you implement service management
technologies, consider details, such as the licenses required, deployment issues, and capacity checks that must occur in
advance of the full deployment. Also consider the timing of the technology deployment and whether to immediately
introduce the technology or use a staged approach. Always remember to manage the organizational knowledge for service
Chapter 9 of the ITIL Service Operation publication provides an overview of the challenges, risks, and success factors for
service operation. It addresses a variety of key issues, such as how to justify funding, how to increase engagement with
development and project staff, and how to stay aligned with design activities, because design and operational activities are
often in conflict. For example, service design may focus on a few individual services at a time, while service operation
focuses on delivering and supporting all of the services simultaneously. Ideally, service operation should make every effort
to be actively involved in the design process to avoid problems later.
Another challenge is that service design is very focused on completing a project on time, on budget, and to specification.
However, you often can’t forecast all that the service will cover and what it will cost until it has been deployed for a while.
Meanwhile, the operations team is held responsible if the service does not run as expected. Service operation needs to get
involved in service transition to help analyze and resolve issues before they become problems in the operational
An additional challenge involves using virtual teams. While this approach gets the people with the right skills involved in
projects, accountability is an issue, and projects can fall through the cracks. Getting senior management support for IT
service management and processes is particularly important for success.
ITIL identifies the following as risks to address: inadequate funding and resources, loss of momentum (which can happen
easily when projects are drawn out), loss of key personnel, resistance to change, lack of management support, a faulty
design, and differing customer expectations. This chapter explains how to address some of these risks.
For example, there are many ways to justify getting the funding you need, such as the following:
Reduced software license costs as a result of managing licenses more effectively
Reduced support costs because of fewer incidents and faster resolution of problems
Improved utilization of equipment due to better capacity management
Better use of resources due to less duplication of effort
In addition, organizations can achieve significant savings by automating repeatable processes and having their employees
work on more strategic activities.
One of the biggest challenges to improving operations and running them correctly is getting the entire organization and
the development and project staff to work together, because each department typically has competing priorities. For
example, the project staff may be very driven by deadlines, which could impact testing, documentation, and quality
assurance. Then operations must run applications that may not be fully documented and could break. When this occurs,
the blame often goes to operations, regardless of why the problem happened. Instead of this scenario, you want these
teams to work together as a cohesive unit.
Another risk to any kind of organizational change is getting everybody to connect to it emotionally, to see the change as a
factor of their personal success as well as the organization’s. There must be an alignment of personal needs, business unit
needs, and company needs. Appeal to everyone’s individual values, and engage them in organizational improvement. To
find success, measure it; then use that as a basis for continual improvement. Be sure to get senior management
commitment; this cannot be optional in IT.
Accomplishing the desired change in behavior requires that people involved at all levels of an initiative — from executives
to end users — understand what they’re supposed to do, how and when they’re supposed to do it, and why it benefits
them and their organization. To reach your desired objectives, you should also consider the education needed to drive the
intended changes in behavior. Look for a vendor to provide training that gives you a structured approach to developing the
buy-in, understanding, and skills needed to successfully attain value from ITIL processes — and ultimately fulfill the
objectives of your initiatives.
You may also need to bring in consulting services to make service operation activities successful. This may require
experiential learning exercises and support services to ensure that each solution is implemented and used effectively.
When you are implementing BSM based on ITIL practices, for example, it’s important to look for a solutions provider with
field-proven methodology, tools, and best practices to implement the solutions quickly and effectively. This approach can
lead you on a proven path to lowering the cost, time, and risk associated with achieving measurable results.
Making service operation successful involves meeting a variety of challenges, but they can be overcome. It’s important to
engage the development and project staff. Take the time to understand and justify your funding requirements based on
metrics discussed in this chapter of the ITIL Service Operation publication. Be sure to get management support and find
champions who can help convey the value of your IT initiatives. Make the time to get people trained on how to use the
applications you are rolling out. By showing them the value of these new resources, you can increase the likelihood of a
successful project deployment. Make sure you have the right solutions addressing the right job, and be very clear about
what your users can expect to achieve from these solutions.
It is through service operation that the value of the entire IT organization is delivered and judged. The ITIL Service
Operation publication offers guidance on keeping your customers satisfied despite the day-to-day challenges in service
delivery, just as a utility company can measure its success by an uninterrupted flow of power to its users in all kinds of
weather. By following ITIL practices and deploying the technology to support them, you can deliver services at the level
your customers have come to expect, in a reliable, consistent manner.
While this book provides a broad overview, it’s important to also reference the actual ITIL Service Operation publication to
read more detailed examples of how to make service operation successful for your organization. It’s also essential to
deploy the appropriate technology at the right time.
As discussed in Chapter 7 of this book, look for solutions that can help your organization meet ITIL objectives for service
operation and enable you to manage IT from a business perspective. These solutions should cover application
management, asset management, configuration management, capacity management, identity management, infrastructure
management, and IT service support. Also look for a vendor that offers BSM technology to enable you to meet the
emerging needs in social, mobile, cloud, and analytics, as well as educational and consulting services to help you achieve
your ITIL objectives sooner.
For more information, visit
Author: Anthony Orr
BMC delivers software solutions that help IT transform digital enterprises for the ultimate competitive business advantage.
We have worked with thousands of leading companies to create and deliver powerful IT management services. From
mainframe to cloud to mobile, we pair high-speed digital innovation with robust IT industrialization—allowing our customers
to provide amazing user experiences with optimized IT performance, cost, compliance, and productivity. We believe that
technology is the heart of every business, and that IT drives business to the digital age.
BMC – Bring IT to Life.
BMC, BMC Software, and the BMC Software logo are the exclusive properties of BMC Software, Inc., are registered with the U.S. Patent and Trademark Office,
and may be registered or pending registration in other countries. All other BMC trademarks, service marks, and logos may be registered or pending registration
in the U.S. or in other countries. UNIX is the registered trademark of The Open Group in the US and other countries. Tivoli and IBM are trademarks or registered
trademarks of International Business Machines Corporation in the United States, other countries, or both. IT Infrastructure Library® is a registered trademark
of the Office of Government Commerce and is used here by BMC Software, Inc., under license from and with the permission of OGC. ITIL® is a registered
trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office, and is
used here by BMC Software, Inc., under license from and with the permission of OGC. All other trademarks or registered trademarks are the property of their
respective owners. © 2014 BMC Software, Inc. All rights reserved. Origin date: 8/14
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF