By Mohammed Al-Roumi
In today’s interconnected and rapidly evolving digital world, operational disruptions have become a pressing concern for businesses and government entities alike. The threats associated with disruptions have the potential to wreak havoc, resulting in severe financial losses, irreparable damage to reputation, disgruntled customers, and even legal ramifications. From multinational corporations to local government agencies, the consequences of failing to address these matters loom covertly on the horizon. This article will provide an overview of how disruptions occur, the importance of operational resilience, why it isn’t being regulated in Kuwait and factors that should be considered to help mitigate against any associated risks.
How disruptions occur?
Disruptions are events that interrupt or prevent the normal operation of an organization’s IT systems and services. It can arise from human error, technical failures, cyberattacks, supply chain issues and natural disasters. The need for comprehensive resilience and proactive strategies are paramount to protect critical infrastructure and ensure the continued functioning of society. To put it in perspective, imagine a bank experiencing a system outage, it could hinder customers from accessing their accounts, making payments, or transferring funds.
Relevance to Kuwait
Recently, Kuwait has been grappling with a flurry of unforeseen challenges, including technical failures and cyberattacks, which have brought disruptions to many key businesses and government services. These disruptions are causing widespread uncertainty and concern, raising questions about the country’s ability to withstand future challenges as it progresses with its digital transformation journey. The government’s alliance with Google Cloud is a particular focus area that will have a big impact on the way the government will operate. While the headlines about the alliance may be attention-grabbing, does Kuwait have the right regulatory setup, governance structure, vision, target architecture landscape and talent/skills to ensure it is implemented in a way that is both effective and resilient?
Operational resilience and its importance
Operational resilience is the ability of an organization to prevent, adapt, respond to, recover and learn from operational disruptions. Given its importance, it is dominating the regulatory agenda across the globe. The Monetary Authority of Singapore, Central Bank of Ireland, Australian Prudential Regulatory Authority, European Commission, Financial Conduct Authority, Bank of England and the Saudi Arabia Monetary Authority, are just examples of a few regulators who have recently issued operational resiliency requirements in the aim of mitigating against the risks of disruptions in their respective jurisdictions. Their collective endeavor echoes an international push for a stronger and more resilient operational ecosystem within the global community.
Enforcing operational resilience
While Kuwait has several laws and regulations designed to improve certain risk management and business continuity/disaster recovery aspects, there are no laws or regulations that mandate organizations to be operationally resilient. The consequence of this regulatory void could leave the vital services rendered by Kuwait’s businesses and government entities vulnerable to debilitating disruptions, with potential ramifications that could send shockwaves throughout the nation’s economy and society. The constant interruptions in service have become so routine that we’ve started to view them as a norm rather than an issue that needs urgent attention. However, we must not overlook the fact that these disruptions have widespread implications. Therefore, the necessity of crafting comprehensive laws and regulations is undeniable. Such provisions will offer clear guidance for organizations and businesses to manage their IT infrastructure effectively. They will also enhance Kuwait’s capability to tackle the disruptions that could potentially derail the regular course of life and work in the country.
Operational resilience versus business continuity
It appears that a key impediment to effective regulatory action in Kuwait might be a fundamental misunderstanding of the distinction between operational resilience and business continuity planning/disaster recovery. It is critical to note that while these two concepts are interrelated, they are not identical. The main difference between the two is that operational resilience is a broader concept that encompasses all aspects of an organization’s ability to continue operating, while business continuity planning/disaster recovery focuses on how an organization will respond to a disruption during a crisis/temporary disaster. Emphasizing operational resilience empowers organizations to be less susceptible to disruptions, highlighting their ability to withstand any setbacks while maintaining routine operations. Conversely, business continuity planning/disaster recovery is the revival and maintenance of critical functions at reduced capacity until normal operations are resumed. Understanding these subtle yet significant differences is pivotal for a robust, effective organizational resilience strategy. It can help organizations avoid the need to execute business continuity / disaster recovery plans altogether.
Improving operational resilience
Practical steps that should be done to improve the resiliency of the entire digital ecosystem in Kuwait include: modernizing the existing IT infrastructure, setting up suitable governance, embracing disruption, stress testing essential business services and implementing effective management information and oversight.
The IT infrastructure in Kuwait needs modernization: Much of Kuwait’s IT infrastructure is rooted in legacy systems, which are notably less resilient, susceptible to cyberattacks and often costlier to maintain. These weaknesses place businesses and government entities at risk, as legacy systems can be slow, unresponsive, difficult to change and prone to errors. This makes it hard to keep up with the demands of users and their changing needs. Modernizing this infrastructure is not just beneficial but essential to Kuwait’s national security; But are leaders within these organizations adequately prepared to embark on such a transformational journey? And do they possess the necessary skills, expertise and/or familiarity with the latest technologies and practices that are needed? These may include proficiency using infrastructure as code (IaC), embracing microservices architecture, using a hybrid of SQL/NoSQL database and in-memory storage, de-coupling services and communicating asynchronously through publish/subscribe messaging-based systems, automation, visualization and leveraging the power of artificial intelligence in operations.
The right governance is essential for operational resilience: While Kuwait’s supervisory and regulatory boards are comprised with executives possessing valuable experience in operations, finance, sales and their respective industries, only a handful boast up-to-date cybersecurity and technical IT expertise. The infusion of individuals with specialized cyber and technical knowledge significantly boosts the board’s capacity to identify and mitigate operational risks. Such knowledge equips supervisory boards with nuanced understanding of the organization’s systems, processes, potential threats, vulnerabilities and the necessary risk management strategies. As a case in point, following a significant data breach, Equifax, a major credit reporting agency, welcomed a board member with extensive cybersecurity experience. This strategic move enabled the company to improve its cybersecurity posture which in turn improved its resilience to cyberattack and disruptions.
Focus on avoidance when they need to focus on resilience: Instead of businesses and government entities concentrating their spending and efforts entirely on avoiding disruptions to critical business services, they should also focus on being resilient to them. This means planning and preparing for the possibility of a disruption and embedding resiliency as part of the organization’s culture and operations. Although spending resources to minimize the possibility of a successful disruption is important (i.e., implementing security measures or keeping systems up to date), focusing on resiliency (i.e., implementing distributed and/or fault-tolerant systems that incorporate redundancy, load balancing, auto scaling, burst capacity, self-healing) is equally as important, given that avoiding operational disruption in its entirety is an unachievable target. A balanced strategy that involves both prevention and resilience offer a more robust defense against operational disruptions.
Identifying essential business services, setting impact tolerances and stress testing: Performing a risk assessment by identifying the business services that are essential can improve an organization’s ability to withstand and recover from disruptions. Impact tolerances, i.e., the levels of disruption that an organization can withstand without impacting its ability to deliver critical services, should be set based on the criticality of the service and the impact of a disruption. A critical service, such as online banking, should have zero tolerance for disruption, whereas a less critical service, such as marketing and advertising, should be able to tolerate a short period of disruption without significantly impacting the organization’s ability to operate. Once organizations have established tolerances, they should implement mitigation measures that are appropriate to those tolerances. Periodic resilience/stress testing should also be performed to assess the organization’s operational durability during disruptions. Organizations should promptly act on the insights gathered from these tests to mitigate identified risks and ensure readiness and resilience in the face of unplanned events.
Effective management information and oversight can help organizations identify and mitigate risks to their operations. By having access to accurate and up-to-the-minute information, organizations can make timely and better decisions. AI Ops and tools (such as BigPanda, Dynatrace, Splunk) can help automate and improve IT operations by logging/storing vast amounts of data from a variety of sources, including networks, applications, databases, tools, then use this data to identify patterns, anomalies and predict potential problems. Traditional IT management solutions cannot keep up with the volume of data captured. They are not able to provide real time visibility and are often based on manual processes, which can be slow and error prone. AI Ops can provide rapid response and remediation; or, in some cases, automatically resolve issues without human intervention. The adoption of cutting-edge technology like AI Ops is not yet embraced in Kuwait.
Operational resilience is a top priority for regulators around the world. Regulators worldwide are increasingly focused on ensuring that organizations are prepared and resilient to operational disruptions. Laws/Regulations such as the Digital Operational Resilience Act (DORA), NIS 2 Directive and the Prudential Regulation Authority (PRA)’s Operational Resilience Policy Statement (PS21/3) are a few examples of the steps Regulators across the globe have taken to mitigate against the risks of disruptions. Kuwait is facing a growing threat that can have a devastating impact on the economy, society and national security. Kuwaiti Regulators must take action to improve operational resilience within government entities and businesses.
NOTE: The views expressed in this article are those of the author and do not necessarily reflect the views of Ernst & Young Global or its member companies. Mohammed Al-Roumi is a senior manager in Ernst & Young’s Information technology advisory practice with over 12 years of experience managing information technology (IT) matters for international companies. These organizations include high profile Big Tech corporations, Globally Systemically Important Banks (G-SIBs) and Media conglomerates.