You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 25 Current »

Problems Table

Problems are created to investigate and diagnose the root cause of one or more Incidents or possible Incidents and to decide upon a solution that will not just resolve the incident, but will prevent further incidents from being caused. Resolving a problem means resolving/preventing related Incidents.

Incident Management is concerned with restoring service as quickly as possible, whereas Problem Management is concerned with determining and eliminating root causes (and hence eliminating repeat problems).

The primary objectives of Problem Management are to prevent Incidents caused by a similar root situation from recurring and to minimize the impact of Incidents that cannot be prevented.

Problem Creation and Overview

When an incident is determined to be based on an underlying Problem, first-response support technicians create a Problem record from the Incident, automatically creating a link between the Incident record and the Problem. The new problem record imports relevant information from the Incident, such as the linked Configuration Item. Problems can also be created without being linked to existing Incidents, such as in cases where a problem is discovered internally but no Incidents have been reported. 

The Submitter, along with their Department and contact information are automatically populated based on the person who creates the problem, and the reporting source of the Problem is also captured in the record. Resolution of problems may require changes to the system. Staff addressing the problem may determine that a shared CI needs to be replaced or modified, and may therefore file a Change Request. All updates throughout the life cycle of the Problem Record are logged in the record's History along with the date and time, and updates can also be logged in the Additional Notes field, which logs the updater and time stamp of the update.

When the problem is resolved, a technician updates the record with relevant information and closes the record, in turn prompting automation to begin closing procedures for related Incidents. If the problem cannot be resolved it may be classified as a Known Error and a permanent work-around supplied. This will also update the related Incidents.

Processing of Records

Once created, Problem records can be linked to Incidents from either the Problem record or from an Incident record.

Priority

Priority, a measure of the Problem's urgency and relative importance, is set by default based on the combination of Impact and Urgency and the way in which the Problem Priority group has been configured. The default priority matrix for problems is shown below:

These values may be changed to match a company's preferences.  See the Impact, Urgency, and Priority Management section for more details.  Problem Priority, a measure of impact and risk, particularly with regards to IT service operations, may be determined separately as part of Problem Classification procedures. Together Priority and Problem Priority help IT managers make factual decisions for scheduling and planning and help determine the category of related Change Requests.

Diagnosis

The Problem follows its own workflow separate from the Incident.  Ops team members open the problem record in the default state of "Pending Diagnosis" to indicate that a diagnosis has yet to be determined, or "Diagnosed" if no further steps are required.  The default Problem SLA has specified targets for completing diagnosis, based on the priority, and a diagnosis clock is started upon creation of the problem record. As part of the diagnosis, technicians select the service involved based on existing Incident reports or based on the most likely service to be related to the Problem. If a particular Configuration Item is identified as the source of the problem's root cause, staff can quickly link the Problem to the CI record.

Once a diagnosis is supplied, staff will move the record into a status of "Diagnosed," fill out the Root Cause description and any further analysis, and save the record. They may also suggest a temporary fix, or "workaround", for the problem and related incidents, e.g. "use Printer B3 for now instead", if a permanent solution is not readily available.

Diagnosis and Resolution Clocks

When the status is changed from Pending Diagnosis to any of Diagnosed, Pending Change, Resolved, or Deferred, the diagnosis clock is stopped, the Diagnosis End Time field is populated automatically with the current time, and the Resolution clock is started. The diagnosis clock measures the working support hours between creation of the problem and diagnosis of the problem against the SLA diagnosis target.

The resolution clock measures time between when diagnosis is completed and the problem is resolved or deferred, excluding the time when the Status is Pending Change, against the SLA resolution target for problems.  While the Status is in Pending Change, both clocks are stopped, since the people handling problems may not have control over when a change request can be scheduled and implemented.

Workarounds

If there  are related incidents, and there is a workaround available while the problem is being resolved, it is easy to send the workaround to all of the incident submitters.  This is done by:

  1. Entering the details of the workaround into the Workaround field
  2. Going to the Related Records tab and changing the Workaround Provided field to Yes and then clicking the button "Update Incident with Workaround".  The "Update Incident with Workaround" button allows staff working the Problem to quickly disseminate workaround information to all linked Incidents with the click of a button. Clicking the button will post the workaround in all related incidents and change their status to Workaround Provided, which will also trigger an email to both the end user and the assigned staff person of the Incident(s).

Deferring and Seeking More Information

If at any time during the root cause diagnosis or determination of the proper solution staff need more information from a separate process, such as Incident details from first level support, the Problem record may be placed in a state of "Pending More Information" and an email sent to staff members with requests for further information. If a Problem is deemed too risky or of lower priority than more imminent issues, it may be put in a status of "Deferred" to reflect no ongoing diagnosis or pending changes.

Resolution 

In most cases where a Problem's root cause deals with a Configuration Item, a Change Request will be submitted to make the appropriate fixes to the Configuration Item. Change Requests are creatable directly from the Problem record, instantly linking them. While a problem is waiting for a Change Request it can be put in the status of "Pending Change". If the Change Request linked to a Problem is closed, the system will send an email notification to the problem assignee so that the individual can take additional steps to close the Problem record if it was pending change for resolution.

A problem whose root cause is known but for which there is no permanent resolution is considered a Known Error. Known Errors should have Workarounds to allow Incident Management to restore service as quickly as possible. When the problem is resolved either directly, or through a change request, all related incidents can be closed at once with the click of a button.  On the Related Records tab, the button "Update Incidents with Solution" can be used to push the Resolution field of the problem into the Resolution field of the linked incidents, closing the incident and triggering an email to the customer with the resolution.

Once the permanent resolution is determined and implemented, staff users enter the description in the Resolution field and set the status to Resolved. A Closure Category for the resolution of the Problem can be selected, and If the resolution contains information that is useful outside of this problem's particular scope, the "Add to Knowledgebase?" field can be set to Yes to make the Resolution field available via FAQs. In addition, a required "Review for Knowledgebase" field is displayed in that Status, and if set to Yes, a Knowledge Article can be created from the Problem for review to be added to the Knowledge Articles table.

Analysis and Major Problem Reviews

It is important to learn from problems to reduce the likelihood of repeat issues.  There are two fields intended to facilitate analysis after a major problem. 

  • Include in Major Problem Review: this is a required field.
  • Major Problem Review notes: which used to write up notes about what was done well in the crisis, what didn't work so well, and what could be done to prevent such a crisis in the future. This field is only visible if the Include in Major Problem Review has a Yes value.

Known Errors

Problems can be marked as known errors by setting the Known Error field to a Yes value.  All Known errors can be easily viewed by support technicians using the Known Errors saved search when working on incidents or service requests.  Known errors that for development systems have a status called In Development, and these are shown in the search called Known Errors in Development.  Only known errors for the production environment, which should be set to a status of Resolved, are shown in the Known Errors in Production search.

For known errors that should be visible to end users, we recommend that they are pushed into a knowledgebase article, which can be done directly from the problem record.

Ownership

Problems are "owned" by the staff member who creates the Problem record. Since only internal staff will see Problem records, groups may share responsibilities between Incident Management and Problem Management and multiple individuals may share ownership over time.

Workflow

Problem Fields

This section contains an overview and screenshot examples of the information stored in a Problem record in the out of box system.

Details tab

The common area is shown in all tabs, and shows the progress of the Problem's lifecycle, such as its Status, its Team Assignment, as well as the Assigned Person or owner of the Problem record. The details tab contains most of the information and updates pertaining to the Problem, including the Submitter, Submitter Department and Contact Information, Source of the Problem (i.e. how it was reported), the Location of the problem, the Business Service and Service impacted by the Problem, as well as the Impact, Urgency, and resulting Priority of the Problem. If a CI has been identified, it can also be selected here.

In addition, all Diagnosis details can be logged here along with the Risk and Root Cause analysis, and any additional working diagnosis notes with time stamps. Finally, the Resolution section in this tab allows for the provisioning of a Workaround for the problem (which can later be pushed into the related Incidents), or to pull an existing Solution from the Known Error database. If the Status is Resolved, a Closure Category can also be specified for the Problem, and a determination can be made for if the Problem should be included in a future Major Problem Review. The date the Problem was resolved is then automatically logged in the History tab of the record to provide an audit trail.

The Related Records tab shows the list of Incidents related to the Problem. If a Workaround or Solution has been provided, it can also be pushed from the Problem record to all of its related Incidents. The Problem can also be linked to an existing Change Request, or to create a new one, copying over all of the relevant information directly from the Problem record to the Change Request. In addition, the Knowledge Management section of this tab also allows the Problem to be directly integrated and converted into a record in the Knowledge Articles table with a status of Pending Review.

SLA Tab:

The SLA tab displays the SLA thresholds for the Problem, including thresholds for the SLA Diagnosis Time, SLA Resolution Time, and the progress against those thresholds. 

Automation

The following rules run in the Problems table. Each of them either runs when a record is created or edited, or on a scheduled basis.

Creation actions

Rule Trigger: When a Problem is created via Email, Web, or API.

Description: This rule runs the following If-then-else action to set the assignment of the Problem record according to the predefined values in the related Service:

In addition, it also sets the SLA ID based on the saved search: Active, request type is problem, and SLA type is corporate, and then sets the SLA targets for the Problem based on the SLA and Priority.

Edit: Set Alert Color and Send Notifications (Web/API)

Rule Trigger: When a Problem is created edited via Web or API and meets the saved search critera: Diagnosis SLA Breached=No and Working time to diagnosis changed last modification

Description:

If then action: I: Update SLA Details

If Diagnosis SLA Breached=No

If Working time to Diagnosis is greater than SLA Diagnosis Warning Time and Alert Color is default

Set Alert Color to Orange

If Working time to Diagnosis is greater than SLA Diagnosis Time,

 set the Diagnosis SLA Breached to Yes

Set the alert color to red. 

Email the Assigned person and team of breach of diagnosis

 

If Resolution SLA Breached=No

If Working time to Resolution is greater than SLA Resolution Warning Time and Alert Color is default, set alert color to Orange

If Working time to Resolution is greater than SLA Resolution Time,

Set alert color to Red

Set Resolution SLA Breached to Yes

Notify the assigned person and team of breach

Edit: Status Changes (Web/API)

Rule Trigger: When a Problem is edited via Web or API and Status changed during the record's last modification.

Description:

  • If Status changes to Diagnosed, set the Diagnosis Clock Status to Stopped.
  • If Status changes to Pending Change, set the Resolution Clock Status to Stopped
  • If Status changes from pending change to some status other than Deferred or Resolve, set the Resolution Clock to Running
  • If status changes to Resolved, set the Resolution Clock Status to Stopped (and if Diagnosis clock is not stopped, set it to stopped too).


TB: Refresh Elapsed Time fields

Rule Trigger: This rule runs every 20 minutes using the Saved Search: Diagnosis Clock Status is Running or Resolution Clock Status is Running and Date Updated is more than or = 20 minutes old (so if someone updated it in the meantime, we don't need to do it again)

Description: U: Set Date SLA Checked to NOW() - that will trigger an update of the elapsed time fields.

Reporting and Statistics

There are numerous default reports measuring different metrics for Problems, and a few of the most relevant and frequently used ones are listed here as examples:

Known Error Trend Analysis

This is a Trend graph that shows the number of Problems marked as Known Errors over the course of a pre-defined period of time (by default, this is since the beginning of the current year).

All Problems by CI Type, segmented by CI Name

This is a Segmented Bar Chart that shows the number of Problems that were reported for each CI type, segmented by the Configuration Item that they were reported for.

Major Problem Review Report

This report shows the major problems by month over the past year:

It shows further details in the HTML version of the report.

 

  • No labels