1
2
3
4
5
6
7
8
9
10
Standard changes are low-risk, pre-approved changes that happen frequently and
have a quick
turnaround time. Standard changes can be implemented quickly and help manage
risks.
Examples of a standard change:
- Desktop or standalone equipment movement.
- A standard patch that is applied to the servers once a month during the agreed
maintenance
window.
What is a standard change?
When a normal change is successfully implemented a few times, the associated
processes like
planning, scheduling, and implementation are established and become predictable
and controlled.
That is, the change becomes a routine task and therefore standard.
A few examples of normal changes:
• Upgrading the exchange server or any other hardware
• Setting up high availability or cluster for vital business functions (VBF)
11
• Roll out of a new release to address the reported issues
Expedite changes are raised due to a pressing need such as a legal or a business
requirement.
These changes are not related to restoring a service.
The change advisory board (CAB) defined clear rules and regulations to qualify
emergency and
expedited changes and communicated these rules across the organization.
11
12
During its discussion with stakeholders, the core team observed that about 20% of
the changes
were completed without authorization, mainly because the infrastructure team was
under pressure
to get the changes done quickly. As a result, many changes were done without a
request for
change or going through the review and approval processes.
To deal with this situation, stage gatekeepers were appointed for infrastructure,
application, and
database teams to ensure that the steps were not skipped when a change is made.
The stage keepers
had a go-ready list that comprised the test results, approvals, signatures from all the
concerned
teams, and a back-out plan. In case of violation, the stage gatekeepers owned
responsibility that
affected their appraisal and performance measures.
Another reason for the unauthorized changes was because the application teams
updated the
CMDB or CMS after the roll out of the release.
The core team ensured that audits were performed every week to compare the
13
current state of
CMS with the associated RFC and any deviation was highlighted to the CI owner and
service owner
for immediate action. In turn, the service owner closed the loop and took firm
action. This process
went on for four to six weeks, and the team made it a habit to follow the rule
without exceptions.
13
14
• The bank’s IT team understood that building a robust configuration management
system with
up-to-date information of all IT components is essential for a successful change and
release
management process.
• Forward schedule of changes, planned maintenance window, and release plans are
critical to
manage the volume and duration of changes and to ensure smooth deployment.
• Enforcing a policy requires practicality, diligence, and buy-in. The new policies were
fewer in
number but were important for the success of the change management process (for
example:
CAB, unauthorized changes, and PIR).
• Relevant and practical KPIs help teams become efficient and effective.
• Process and tools have to work in tandem and absence of one or the other will
severely impact
continual service improvement (CSI).
• Post-implementation review of key changes and implications provided valuable
insight on
potential areas to improve and control changes.
15
16
17
18
19
20
21
22
23
24
25
26
A RACI matrix is a very important tool that can help in
the implementation and correct functioning of a
process. The RACI matrix is mostly used to align the
human elements in the process. Usually there are many
different people involved in any process and they have
differing responsibilities. A RACI matrix makes an explicit
documentation of this and keeps as a ready reference to
be used at different stages in the process. Here is how
the RACI matrix can be utilized.
Responsible: This is the class of people who are
ultimately responsible for getting the work done. This
may refer to the individual workers that perform the
27
given task or it could refer to the system in case the task
is automated.
Accountable: This is the class of people that are
accountable to oversee that the work gets done. This
usually means the immediate manager overseeing the
work.
Consulted: These may be subject matter exerts who
need to be consulted at the time of an exception. There
is a possibility that am unanticipated scenario arises in a
process. These are the people who will do the thinking
and suggest any deviations from the Standard Operating
Procedure (SOP).
Informed: This is the class of people who have some
interest in the performance of a given task. This may be
a manager trying to control the execution of the task at
hand. Also this could be an input signal to the other
process.
Rules for using RACI Matrix
Only One Responsible and Accountable Person: It is
essential that only one person be assigned the R/A
roles. Having more than one person responsible for the
same task increases ambiguity and the chances of the
work not being performed. It could also lead to
duplication of work and wastage of efforts and costs.
Having more than one accountable person again leads
to the same problem. However, having only one person
accountable also leads to a problem. If the assigned
person is incompetent, the whole process may go for a
toss. It is for this reason that there is often a hierarchy
27
of accountable people in place.
Responsible-Accountable is Mandatory: The consult or
inform roles are not mandatory for every activity. It is
possible that some activities may not require them at
all. But the responsible accountable roles must be
assigned. Even if the system is performing the tasks
automatically, someone must be made accountable to
see that it does get done.
Communication with the Consultant: There must be a
two way channel of communication with the consultant.
This communication is itself a task and must be explicitly
listed having its own responsible accountable persons.
The important aspect is that the communication should
be two-way. Hence one has to ensure that adequate
follow-up is done and there is minimum time lag to
complete the communication.
Inform the Required Stakeholders: This is a one way
channel of communication. It is usually meant to be a
signal for some other process to begin or as a control
metric to ensure smooth functioning of the same
process. Usually this is automated but needs
accountability like other automated tasks.
27
28
29
30
31
32
33
34
35
What Are Key Components of an SLA?
The SLA should include components in two areas: services and management.
Service elements include specifics of services provided (and what's excluded,
if there's room for doubt), conditions of service availability, standards such as
time window for each level of service (prime time and non-prime time may
have different service levels, for example), responsibilities of each party,
escalation procedures, and cost/service tradeoffs.
Management elements should include definitions of measurement standards
and methods, reporting process, contents and frequency, a dispute resolution
process, an indemnification clause protecting the customer from third-party
litigation resulting from service level breaches (this should already be covered
in the contract, however), and a mechanism for updating the agreement as
required.
This last item is critical; service requirements and vendor capabilities change,
so there must be a way to make sure the SLA is kept up-to-date.
36
37
38
What should I consider when selecting metrics for my SLA?
Choose measurements that motivate the right behavior. The first goal of any metric
is to motivate the appropriate behavior on behalf of the client and the service
provider. Each side of the relationship will attempt to optimize its actions to meet
the performance objectives defined by the metrics. First, focus on the behavior that
you want to motivate. Then, test your metrics by putting yourself in the place of the
other side. How would you optimize your performance? Does that optimization
support the originally desired results?
Ensure that metrics reflect factors within the service provider's control. To
motivate the right behavior, SLA metrics have to reflect factors within the
outsourcer's control. A typical mistake is to penalize the service provider for delays
caused by the client's lack of performance. For example, if the client provides change
specifications for application code several weeks late, it is unfair and demotivating to
hold the service provider to a prespecified delivery date. Making the SLA two-sided
by measuring the client's performance on mutually dependent actions is a good way
to focus on the intended results.
Choose measurements that are easily collected. Balance the power of a desired
metric against its ease of collection. Ideally, the SLA metrics will be captured
automatically, in the background, with minimal overhead, but this objective may not
be possible for all desired metrics. When in doubt, compromise in favor of easy
39
collection; no one is going to invest the effort to collect metrics manually.
Less is more. Despite the temptation to control as many factors as possible, avoid
choosing an excessive number of metrics or metrics that produce a voluminous
amount of data that no one will have time to analyze.
Set a proper baseline. Defining the right metrics is only half of the battle. To be
useful, the metrics must be set to reasonable, attainable performance levels. Unless
strong historical measurement data is available, be prepared to revisit and readjust
the settings at a future date through a predefined process specified in the SLA.
39
40
41
42
What Kind of Metrics Should be Monitored?
Many items can be monitored as part of an SLA, but the scheme should be
kept as simple as possible to avoid confusion and excessive cost on either
side. In choosing metrics, examine your operation and decide what is most
important. The more complex the monitoring (and associated remedy)
scheme, the less likely it is to be effective, since no-one will have time to
properly analyze the data. When in doubt, opt for ease of collection of metric
data; automated systems are best, since it is unlikely that costly manual
collection of metrics will be reliable.
Depending on the service, the types of metric to monitor may include:
Service availability: the amount of time the service is available for use. This
may be measured by time slot, with, for example, 99.5 percent availability
required between the hours of 8 am and 6 pm, and more or less availability
specified during other times. E-commerce operations typically have extremely
aggressive SLAs at all times; 99.999 percent uptime is a not uncommon
requirement for a site that generates millions of dollars an hour.
Defect rates: Counts or percentages of errors in major deliverables.
Production failures such as incomplete backups and restores, coding
errors/rework, and missed deadlines may be included in this category.
Technical quality: in outsourced application development, measurement of
43
technical quality by commercial analysis tools that examine factors such as
program size and coding defects.
Security: In these hyper-regulated times, application and network security
breaches can be costly. Measuring controllable security measures such as
anti-virus updates and patching is key in proving all reasonable preventive
measures were taken, in the event of an incident.
43
44
45
46
47
48