John Organek, Director of Program Planning and Operational Architecture
August 4, 2024
The recent ” software global incident, the costs of which could top $1 billion, points out several glaring gaps and shortcomings in how companies and institutions operate in our brave, new cyber-physical world. And while it did not cause death or injury, it nevertheless, wreaked widespread havoc across other infrastructures, including airlines, hospitals, and 911 services. It caused Delta Airlines alone to cancel more than 2000 flights on July 19 and to cancel over 6,000 flights since then. Something as small as a few lines of bad code deployed to a myriad of endpoints, globally, caused the largest IT outage in history. For want of a nail….the kingdom was lost!
A fundamental error made across the board is the failure to fully understand the risk of apparently minor ‘disturbances’ creating major consequences, whether outbound to or inbound from other infrastructures. One wonders if the Board of any of the companies affected had even considered the devastating impact that software could cause, and if so, did they take the appropriate action to ensure that loss would be mitigated? Did CrowdStrike realize how a bit of bad code would be amplified globally and devastate their reputation as a cyber security company, or did Delta Airlines plan for a scenario of almost existential risk? Did their business continuity plans address such an eventuality and if so, what did they do to address it? After all, software is now a part of virtually everything we touch and do.
Our modern societies comprise other sources of near existential risk beyond software bugs, such as Black Sky electric grid events, widespread communications and data center failures, cyber-attacks, etc. In this highly connected world, very small failures can propagate quickly, leading to other such Crowd Strike incidents in the future.
Preliminary reports pinpoint several failures taking place that led to the outage, casting blame across multiple stakeholders. For example, the new software was insufficiently tested and apparently there was no plan for reverting to the original version. Also, end users were not prepared to act when they lost processing capabilities at the edge. No one seemed to be prepared when the inevitable happened. None of these could be rated as being ‘resilient’.

CrowdStrike “Falcon Sensor”
Software issues are going to continue well into the future. Stakeholders need to recognize that accidents such as the recent one happen normally. They should be therefore especially attentive to the risk, ranging from cyber-attacks to bad quality or poor deployment, that software poses to their business operations and reputation. But because these normal accidents will continue to happen, stakeholders must focus on maintaining business continuity as a top priority, ahead of believing they can fully prevent them from happening. Besides, as Delta has discovered, their operations were gravely affected by bits of software that were developed by a company they probably had little corporate knowledge of.
The CrowdStrike incident has again reminded us of the risks posed by our highly interdependent cyber-physical critical infrastructures. But more importantly, it should remind us that we are still far from being resilient.
Collaboration is our strength.
By: John Organek
Join our membership and
contribution programs:
Participate in our
upcoming events:
Schedule a call with
our experts:
Human Factors in System Resilience: Beyond Technical Solutions Robert Hall, Guest speaker, Ginom webinar, October 9th, 2025 Focus on the human side of resilience. Technology strengthens systems, but human judgment, adaptability, and coordination often determine whether we collapse or recover. Through compelling case studies, discover how trust, communication, and collective action shape system resilience. Resilience has helped […]
“The world is changing. Truth is vanishing. War is coming.” — Mission Impossible: The Final Reckoning This quote may be a line from one of Hollywood’s new blockbusters, but it is resonating eerily with today’s headlines. Change… The pace of tech-driven change in our world is breathtaking. New-tech is nudging its way into all […]
In today’s interconnected world, the ability to recover swiftly from disruptions marks the difference between thriving and faltering. Among the most advanced strategies for preparedness is the Black Start exercise, a large-scale simulation that tests how critical systems restart following a catastrophic power outage. Leaders across sectors, from military to energy to research, already embrace […]