Distraction Free Reading

Major Internet Outages are Getting Bigger and Occurring More Often: A Reflection on the CrowdStrike IT Outage

At 09:30 a.m. BST on 19 July 2024, IT systems around the world suddenly ground to a halt. Without their computer systems, pharmacies, doctors’ surgeries, airports, train providers, and banks, among other critical services, were unable to operate. Websites and entertainment platforms went offline. Supermarket deliveries were cancelled. Retailers’ payment systems were unable to process transactions. Emergency services were disrupted. TV Channels were unable to air.

Two screenshots depicting two different technical error messages from the July 2024 Cloudstrike Outage. The image on the left is of the Sainburys mobile landing page and the image on the right is from the Ladbrokes mobile site.

Figs. 1-2: Several websites were unable to function due to the outage. Image by author.

The outage highlighted the fragile foundation of global internet infrastructure. The scale and significance of the outage was captured by tech entrepreneur Elon Musk, who took to social media platform X with a simple post that said: “biggest IT fail ever.” Others in the IT industry similarly described the event as “one of the largest mass outages in IT history” (Moss 2024a).

A screen capture from Elon Musk's x.com account. The quote reads: "Biggest IT fail ever."

Fig. 3: Elon Musk posting about the global IT outage on 19 July 2024. Source: x.com/elonmusk (Public Domain)

The Technical Issue

While it is difficult to quantify the impact of IT outages, this event caused long-lasting and far-reaching disruption across business, industry, and society. Organizations relying on Windows systems were unable to reboot their computers after a security update was rolled out by the cybersecurity firm CrowdStrike. According to CrowdStrike CEO George Kurtz, a “defect” in one of its software updates for Windows operating systems was identified as the cause of the outage.

The technology at fault was the CrowdStrike Falcon Sensor, a cloud-delivered tool used to protect against security breaches, such as malware attacks and hacking threats. The update caused Windows systems to crash, resulting in “blue screen of death” error messages (BBC 2024) and causing systems to enter a ‘bootloop’ (whereby a computer system continually reboots itself). CrowdStrike software is deeply embedded into the Windows operating system. Microsoft estimated that 8.5 million Windows devices were impacted by the outage (Moss 2024a) but were keen to place the focus on CrowdStrike, highlighting in a statement that “this was not a Microsoft incident” (Weston 2024). However, the lack of Windows’ ability to deal with the issue in a capable manner other than simply crashing the system also highlighted major deficiencies within the Windows operating system.

To resolve the issue, affected organizations had to boot their computers in safe mode, remove the faulty update, and then download the safe patched update. In some cases 15 reboots were reportedly needed (Plummer and Gerken 2024). This is a time-consuming process, and, on top of this, the impacted businesses and organisations had to deal with the significant backlog arising from all of the suspended services. Recovery time estimates ranged from days to months, but 99% of affected Windows systems were back online by the end of July (Kerner 2024). The Global Payroll Association said that many workers would experience a delay in their monthly pay following the IT outage (2024).

The outage also had a major financial and reputational impact on CrowdStrike. After the event, their shares opened nearly 15% down on the Nasdaq stock exchange in New York (Saul 2024), equating to a roughly $12.5 billion decrease in company value. It is expected that the software firm will have to pay out billions in insurance claims (Moss 2024b). In the meantime, CrowdStrike reportedly sent some partners a $10 Uber Eats gift card as an apology (Moss 2024c).

A screen capture of a Windows re-start error message as known as the "blue screen of death."

Fig. 4: Blue screen of death error message. Source: Wikimedia Commons

 

Who Are CrowdStrike?

Until the outage, many of us had never heard of CrowdStrike. Founded in 2011, and based in Austin, Texas, CrowdStrike provides a range of endpoint cybersecurity software solutions to large organisations. Valued at over $80 billion (Field 2024), they first listed their shares publicly on the Nasdaq stock exchange in 2019 and  quickly came to dominate the endpoint security market (their 2023 Q4 earnings report highlights that they have nearly 24,000 customers).

CrowdStrike is not a household name. Unlike other IT security software providers like McAfee, AVG, or Norton, which many people are familiar with because these corporations provide anti-virus software for end-user consumers, CrowdStrike primarily targets enterprise customers. They are an example of a small number of obscure, but hugely powerful, IT corporations unknown to the general public, who are nevertheless responsible for an oversized portion of the globe’s computing infrastructure. While Amazon, Google, and Microsoft are household names, some of the less well-known corporations that now form the operational backbone of the internet include Cloudflare, Akamai, Oracle, and Fastly.

The Danger of Consolidating Computing Infrastructure

The CrowdStrike outage provided us with an eye-opening reminder of the vulnerabilities that arise from the centralization of computing infrastructure. When one corporation dominates its market to the extent that CrowdStrike does with endpoint security, the result is a single point of failure. The outage did not only highlight the risks of IT concentration, but also the risk of organizational over-dependence on a single operating system provider, with many organizations relying solely on Windows for their IT provision.

The network model of computing infrastructure was originally conceived during the Cold War. Network computing was seen to offer a highly resilient, nuclear attack-proof design made up of multiple nodes and connections. The idea was that networks would avoid any single point of failure: if one connection should fail, data traffic would continue via the connections that remained. However, as media historians of IT infrastructure have highlighted, this idea was always more of a fantasy than a reality (Hu 2016). Far from a massively distributed and decentralised network, the logics of monopoly capitalism led to a handful of powerful corporations controlling large parts of the internet. The rise of cloud computing has further facilitated centralisation, enabling computing resources to be delivered over the internet by a few large companies (Amazon Web Services, Microsoft Azure, Google Cloud Platform, Digital Reality, Equinix, Oracle). The intensifying consolidation of global computing infrastructure is now leading to a growing number of large-scale IT outages, making the precarious reality of the internet increasingly more apparent. Indeed, the July 2024 event is merely the latest (and potentially the largest yet) outages that has occurred in recent years:

  • May 2017 – a power outage brought down a British Airways data centre – which led to the cancellation of over 600 British Airways flights at an estimated cost of £58 million.
  • July 2020 – Cloudflare, a global content delivery network (CDN) which more than 10% of all websites rely on, caused a 27-minute outage that led to a 50% collapse in their traffic due to a configuration error.
  • June 2021 – some of the world’s most visited websites, including Amazon, PayPal, Reddit, and the New York Times were inaccessible (Stokel-Walker 2021) after Fastly, another CDN that provides cloud computing services, suffered a major outage to its service. For those in the IT industry this internet outage highlighted the fragility of the internet’s current architecture (Associated Press 2021) and served as “a stark reminder that the Internet can fail” (Miller 2021).
  • July 2021 – a software update from Akamai technologies (whose servers handle over 30% of global web traffic) led to a major outage which impacted services run by UPS, AT&T, Airbnb, and the PlayStation Network.
  • October 2021 – Meta, who own Facebook, WhatsApp, and Instagram, experienced an outage for several hours that affected billions of social media users as well as millions of businesses.

People in the Cloud

The CrowdStrike outage raises important questions about working conditions at the cybersecurity firm. As an anthropologist who conducts research on data security (Taylor 2023) and fieldwork in the data centre industry, I have spent a lot of time with the people who work ‘behind the screens’ of the digital world, delivering the online services we often take for granted (Taylor 2022). While we still don’t have granular detail about the exact nature of the content update that caused the CrowdStrike crash, it is quite likely that the update was not rigorously checked before it was rolled out. This is not just a technical issue but a socioeconomic issue. In efforts to cut costs and save money, IT companies are often understaffed (Taylor 2022). More generally, the tech sector has seen record numbers of layoffs in 2024 (Sayegh 2024). Multiple factors are behind this, including the shift to cloud-based deployment models, increasing reliance on outsourcing, the growing implementation of AI in data centers (Taylor 2022), and concerns around inflation and recessions.

This leaves IT staff significantly overworked and under considerable stress in high-pressure jobs where large swathes of society are reliant on the services they deliver (Taylor 2021), and where expectations for online services to be instantly available at the click of a button are increasingly inflexible. Metaphors like ‘the cloud’ or ‘cyberspace’ present the internet as an ethereal or virtual system devoid of human beings (Taylor 2019). We forget that the internet relies on a vast array of material infrastructure (Taylor 2018), carbon-emitting energy (Ortar et al. 2023), and human labour (Taylor 2021) – it is not an automated process. IT staff often have to work in highly stressful conditions to tight deadlines. If a software company is not adequately staffed, or places undue pressure on its staff, corners can be cut, and diagnostic checks might be less thorough. Beyond CrowdStrike, IT staff in the thousands of affected organisations around the world have had to bear the brunt of the outage, working long hours to try and resolve the issue. The impact of IT failures like this on the mental and physical health of IT staff remains overlooked (MacCall et al. 2024).

An image of a large white room containing several painted white and concrete pillars. At the back of the room stands a man in black surveying a row of grey data servers.

Fig. 5: A data centre employee conducting a routine diagnostic check. Image by author.

Major IT and Internet Outages are Getting Bigger and Occurring More Often

Sociologists of risk have long argued that the biggest threat to industrialized societies is their dependence on a handful of complex and interdependent infrastructures (Beck 1992). The internet now relies on such a complex ecosystem of interdependencies that it is a black box to most network professionals, and the consolidation of this infrastructure means that power is increasingly concentrated in the hands of a few private companies that dominate their respective markets. The current organisation of internet infrastructure effectively means that more and more eggs are being moved into fewer and fewer baskets, leading to larger outages.

Ironically, in a promotional blog post prior to the outage, CrowdStrike themselves discussed the vulnerability of over-relying on a single major vendor. In the post they note that, ‘If that provider fails, the consequences for its users could be catastrophic.’

Greater societal dependence on the internet means that downtime is more noticeable and more disruptive. This outage may at least prompt organizations to consider diversifying their network security, their software, or their operating system providers. While the standardization of IT can save an organization money, diversification should be considered a key strategy for building organizational resilience, with the financial costs of IT diversification factored into Business Continuity Plans (BCPs). We can certainly expect future IT outages – and these may continue to increase in scale and scope – if we don’t address the risk of IT concentration and re-think the business models that underpin the provision of internet infrastructure and online services.


This post was curated by Contributing Editor Jessica Caporusso

References

Associated Press. 2021. “Government, News and Commercial Websites Affected as Fastly Outage Causes Large-Scale Global Internet Disruption.” Associated Press, 8 June 2021. https://www.marketwatch.com/story/government-news-and-commercial-websites-affected-as-fastly-outage-causes-large-scale-global-internet-disruption-01623158092

BBC. 2024. “Watch: Blue screens, Queues and Airport Delays Worldwide.” British Broadcasting Corporation, 19 July 2024. https://www.bbc.com/news/videos/cz7ejpld988o

Beck, Ulrich. 1992. Risk Society: Towards a New Modernity. London: Sage Publications.

Field, Matthew. 2024. “Why is the Internet Down Worldwide? Everything You Need to Know.” The Telegraph, 19 July 2024. https://www.telegraph.co.uk/business/2024/07/19/why-is-internet-down-worldwide-everything-you-need-to-know/

Global Payroll Association. 2024. “IT Outage Hits Payroll Systems Risking Late Wage for Millions.” Global Payroll Association, 19 July 2024. https://gpa.net/blogs/global/global-it-outage-hits-payroll-systems-risking-late-wages-for-millions

Hu, Tung-Hui. 2016. A Prehistory of the Cloud. Cambridge, MA: MIT Press.

Kerner, Sean Michael. 2024. “CrowdStrike Outage Explained: What Caused It and What’s Next.” Tech Target, 29 October 2024. https://www.techtarget.com/whatis/feature/Explaining-the-largest-IT-outage-in-history-and-whats-next

MacColl, Jamie, Pia Husch, Gareth Mott, James Sullivan, Jason R. C. Nurse, Sarah Turner and Nandita Pattnaik. 2024. “Ransomware: Victim Insights on Harms to Individuals, Society and Organizations.” RUSI, 16 January 2024. https://www.rusi.org/explore-our-research/publications/occasional-papers/ransomware-victim-insights-harms-individuals-organisations-and-society

Miller, Neil. 2021. “Inside the Fastly Outage: A Firm Reminder on Internet Redundancy.” Data Center Dynamics, 22 June 2021. https://www.datacenterdynamics.com/en/opinions/inside-the-fastly-outage-a-firm-reminder-on-internet-redundancy/

Moss, Sebastian. 2024a. “CrowdStrike IT Outage Brought Down 8.5 Millions Windows Devices, Will Take Time to Recover.” Data Center Dynamics, 23 July 2024. https://www.datacenterdynamics.com/en/news/crowdstrike-it-outage-brought-down-85-million-windows-devices-will-take-time-to-recover/ 

Moss, Sebastian. 2024b. “Billions in Insurance Payouts Expected from CrowdStrike Global IT Outage.” Data Center Dynamics, 24 July 2024. https://www.datacenterdynamics.com/en/news/billions-in-insurance-payouts-expected-from-crowdstrike-global-it-outage/

Moss, Sebastian. 2024c. “CrowdStrike Offers $10 UberEats Gift Card to Partners as Apology for Global IT Outage.” Data Center Dynamics, 24 July 2024. https://www.datacenterdynamics.com/en/news/crowdstrike-offers-10-uber-eats-gift-card-to-partners-as-apology-for-global-it-outage/

Ortar, Nathalie, A. R. E. Taylor, Julia Velkova, Patrick Brodie, Alix Johnson, Clement Marquet, Andrea Pollio and Liza Cirolia. 2023. “Powering ‘Smart’ Futures: Data Centers and the Energy Politics of Digitalization.” In Energy Futures: Anthropocene Challenges, Emerging Technologies and Everyday Life, edited by Simone Abram, Karen Waltorp, Nathalie Ortar and Sarah Pink, p. 125 – 168. Berlin: DeGruyter.

Plummer, Robert, and Tom Gerken. 2024. “CrowdStrike and Microsoft: What We Know About the Global IT Outage.” British Broadcasting Corporation, 19 July 2024. https://www.bbc.com/news/articles/cp4wnrxqlewo

Saul, Derek. 2024. “CrowdStrike Stock Tanks 15% – Set for Worst Day Since 2022.” Forbes, 19 July 2024. https://www.forbes.com/sites/dereksaul/2024/07/19/crowdstrike-stock-tanks-15-set-for-worst-day-since-2022/

Sayegh, Emil. 2024. “The Great Tech Reset: Unpacking the Layoff Surge of 2024.” Forbes, 19 August 2024. https://www.forbes.com/sites/emilsayegh/2024/08/19/the-great-tech-reset-unpacking-the-layoff-surge-of-2024/

Stokel-Walker, Chris. 2021. “What Really Went Down When the Internet Went Down.” Wired, 8 June 2021. https://www.wired.com/story/fastly-internet-outage/

Taylor, Alexander. 2018. “Failover Architectures: The Infrastructural Excess of the Data Centre Industry.” Failed Architecture, 19 May 2018. https://failedarchitecture.com/failover-architectures-the-infrastructural-excess-of-the-data-centre-industry/

Taylor, A. R. E. 2019. “The Data Center as Technological Wilderness.” Culture Machine 18. https://culturemachine.net/vol-18-the-nature-of-data-centers/data-center-as-techno-wilderness/

Taylor, A. R. E. 2021. “Standing By for Data Loss: Failure, Preparedness and the Cloud.” Ephemera: Theory and Politics in Organization 21 (1): 59 – 93. https://ephemerajournal.org/contribution/standing-data-loss-failure-preparedness-and-cloud

Taylor, A. R. E. 2022. “Cloudwork: Data Center Labor and the Maintenance of Media Infrastructure.” In The Routledge Companion to Media Anthropology, edited by Elisabetta Costa, Patricia G. Lange, Nell Haynes, and Jolynna Sinanan, p. 213 – 228. London: Routledge.

Taylor, A. R. E. 2023. “Concrete Clouds: Bunkers, Data, Preparedness.” New Media and Society 25 (2): 405 – 430.

Weston, David. 2024. “Helping Our Customers Through the CrowdStrike Outage.” Microsoft, 20 July 2024. https://blogs.microsoft.com/blog/2024/07/20/helping-our-customers-through-the-crowdstrike-outage/ 

 

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *