Microsoft Azure Goes Down
Bloomberg News says that Microsoft’s online services, such as Teams, M365, and Outlook, were all down yesterday. This comes after Microsoft’s earnings report on Tuesday was good, but it stands in contrast to the company’s announcement of a 5% workforce reduction that will put 10,000 people out of work. Some of the people who lost their jobs worked on Azure, which is Microsoft’s cloud services offering. Even though Azure helps Microsoft grow, growth in the cloud services industry as a whole has slowed. This shows that the cloud services industry is getting more mature.
Azure is at the centre of Tuesday’s outage, and Microsoft continued its track record of revealing the root cause of outages by posting an impact summary on its Azure status history site. The outage lasted for three hours and affected Azure resources in Public Azure regions. It happened in multiple regions. M365 and PowerBI, which are both popular services, were also affected.
Microsoft Azure goes down
Wide area network (WAN) problems were to blame, according to what Microsoft said about the situation. When the company made a change to its WAN, it broke the link between the internet and Microsoft’s core services.
Last week, the U.S. Federal Aviation Administration’s (FAA) important pilot safety notification system, known as NOTAM, also went down. And their outages were caused by changes to their systems. The FAA says that a corrupted file in both its primary and secondary databases caused the outage. When a contractor deleted those files, the system slowed down and pilots couldn’t get NOTAM alerts. This stopped all domestic flights in the United States.
We are becoming more and more dependent on cloud service providers and, in the case of the FAA, on old systems, but outages are still a big problem.
Also Read: How India’s coasts and estuaries help shorebirds and seabirds stay alive
Even though the two outages came from different places, they both affected a lot of people. This is true of all major organisation outages. No matter what caused a system outage, it is impossible to overstate how much money it costs. The Uptime Institute found that the number of outages that cost businesses more than $100,000 rose from 39% in 2019 to more than 60% in 2020. And more companies are paying more than $1 million to recover from an outage. The number of companies paying seven figures is up to 15%, from 11% in the past.
Reports say that Azure is the second largest cloud service provider (CSP), behind only Amazon, which started the CSP market and is still the market leader.
Microsoft promises to give a full root cause analysis, or Post Incident Report (PIR), in the next three days, and a final PIR 14 days after that.
We talked to Chip Gibbons, the chief information security officer (CISO) at managed services company Thrive, to find out what would happen after an outage. Here are the important parts:
Microsoft Azure goes down
Planning is a must for all businesses, no matter how big or small. A complete plan for backing up and recovering data is easy for many businesses to use. Larger organisations may need to talk about more details, such as how systems are to be recovered, what applications will be used, and what the working conditions will be like. But there are always things that need to be taken care of when it comes to data recovery, like knowing how a backup system works, who is in charge of it, what the responsible recovery point objective (RPO) is, and how much data you need to back up. This can cut the time it takes to get your business back up and running after a disaster by a large amount and help you meet your recovery time objective (RTO).
Microsoft Azure Goes Down
Regular testing of DR plans – Testing is necessary, but it can get in the way of running your business and could even slow you down. When systems are tested, IT teams are bound to find something wrong with the DR plan. As these problems are fixed, the DR plan will need to be changed. If these problems are dealt with in the right way during testing, organisations will have a better chance of success when they really need to use a DR strategy.
Remember that people control IT infrastructure, so a DR plan must take into account how people act. For example, if a disaster hits a company’s location, organisations need to make sure that employees can still get to the data they need to do their jobs well.