Here at Corevist, none of our processes are set in stone. Our clients’ businesses (and our technology) require adaptiveness and agility. Being proactive is one of our core values, and our support processes give us ample opportunity to practice. In fact, we’re constantly reviewing our support workflows and looking for ways to refine them.
Recently, an unusual situation revealed a gap in our escalation process for SEV1 (severity level 1) tickets. Here’s how we uncovered the issue—and how we solved it.
Background: Our proactive monitoring
As a managed technology partner, we monitor the entire stack for Corevist solutions, even for systems that we don’t control. Consequently, SEV1 issues fall into two categories.
- Client-side SEV1. This is an issue with a system or process that the client controls, like SAP ERP connectivity, VPN connectivity to the client’s network, or DNS settings.
- Corevist-side SEV1. This is an issue with Corevist software itself, from SAP integration to front end UI. These SEV1s are pretty rare.
This full-stack monitoring is a great added value for our clients. We can’t count the number of times that we’ve spotted an issue on the client side and let them know about it before they saw it themselves. It’s part of why we say we’ve got your back for all things related to your B2B portal.
The problem: Existing escalation process not ideal in the age of Zoom
As part of our SLA, we offer 24/7 support for SEV1 tickets, which follow a clearly-defined escalation process in PagerDuty, our system for on-call management. If a client submits a ticket with the word “urgent” in it, Jira, our project tracking software, notifies PagerDuty and whoever is on call at that time. We have time zones and working hours taken into account, and the system handles this automatically.
Recently, we encountered a rather surprising issue. When an urgent ticket arrived in the middle of a US business day, the first person in the PagerDuty on-call escalation path missed it because they were in a scheduled Zoom call with a client and had their mobile phone in silent mode. They did not see or hear the notification for the Sev1.
eCommerce site availability is critical, and seconds matter in resolving Sev1s. The ticket reached the next round of escalation, and we dealt with it promptly, but this odd case was an eye-opener for us.
Our PMs always have several irons in the fire. They run ongoing implementations, triage support issues, and engage in regular client meetings. During the workday, their personal phones are off to focus on these things. We recognized that we had a gap when an issue happened during the day.
We decided to reevaluate our SEV1 escalation process. We realized the original process was really designed for off-hours support—and it excels at that. But the process depended on mobile phones, and that wasn’t a great fit for our Slack-focused collaboration during the workday.
Clearly, we needed to elevate the visibility of SEV1 tickets during working hours.
The solution: Pager alert posting to a dedicated Slack channel
After reviewing our escalation process, we determined that only one thing was missing: increasing visibility of alerts during the working day in the chat tool that the PM’s are using throughout the day.
In other words, Slack was the key.
In addition to the existing alert to mobile phones, we reconfigured Jira so that the software would post SEV1 alerts to a dedicated Slack channel. All our PMs are subscribed to this channel. Even when they’re in meetings, they see Slack. And if for some reason someone misses an alert, it will escalate to their team, with full visibility to everyone who’s subscribed to the channel.
New addition to the process: SEV1 retrospective
In addition to the improved Sev1 response, we also identified that there were times that a client-side Sev1 had action items that were not being followed up on by our clients. To improve awareness by our clients of how they can help reduce Sev1 occurrences or eliminate future issues, we also established a new SEV1 retrospective. Now, after we’ve resolved a SEV1, we do an internal post-mortem and RCA (root cause analysis). We investigate everything about the incident.
- What happened (symptoms)
- When it occurred and how long it went on
- The impact to other systems
- The business impact
- The root cause of the issue
- What we did to fix it
- Any remaining open items that need to be addressed
Corevist had been doing internal RCAs for a while, but the change here was introducing a client-facing version with a standard template and a timeline for delivery to the client. Not only does this new process formalize the RCA process and educate the client, but it also seeks to identify any action items on the client side as a result of the issue.
The results: Tighter SEV1 escalation process and opportunity for learnings
This new SEV1 retrospective process has formalized our analysis and review of issues. Ultimately, it makes things tighter as we monitor and support the entire technology stack for our clients’ B2B portals. Our clients get even more assurance that if something goes bump in the night, we’re on it.
Our retrospective analysis ensures that we uncover any takeaways from an issue. Whether the SEV1 originated in a client-controlled system or in Corevist, the new review process gives us more information to ensure the issue doesn’t happen again—or to advise our client on how to prevent it if we don’t control the system in question.
The takeaway: You don’t have to become a technology company to do business online
While SEV1 issues are rare, this new process highlights the value that Corevist provides as a managed technology partner. We empower manufacturers to launch great B2B portals without the need to transform into development companies. Our monitoring, support, and review processes are key components of that value, and we’re always refining them through continuous improvement of our processes. Our systems work, we run them, and we’ve got your back.