Skip to main content

Bug 2000 (Y2K) is back in Microsoft's Azure

In my Vendor's Survival posts I wrote about many leading vendors. Few posts were on Microsoft: 

Will Microsoft survive until 2021? - revisited


I also posted about Vendors technologies, strategies and acquisitions related to their Long Term survival. The post 

Microsoft's Skype acquisition: Warning Signs ahead

is an example.

Yesterday, there was a long Microsoft Azure Service Outage.
The reason for the outage was a time miscalculation for a leap year.
 A long Service Outage is not unique to Windows Azure's PaaS platform. There were long Service Outage in services provided by other Cloud Computing vendors such as Amazon and Google.

The outage has nothing to do with Microsoft's Long Term Survival probability.  
The reason for the outage is related to Microsoft's Survival: It is an indication of a very poor Software Quality Assurance.
Yesterday Microsoft proved that it learned nothing from Year 2000 Bug (Y2K), which was a major issue more than a decade ago.

We may discover tomorrow or in a year or two years or four years, what else Microsoft did not learn lesson from. 

Y2K IT and Business
My blog's name is SOA Filling the Gaps because of gaps between IT and Business, which SOA is about addressing them (as well as BPM and Business Oriented Architecture).

Y2K was a major cause for widening these gaps: IT mangers promised to the Business Managers that if they will spend a lot of money and resources, the organization's computerized systems will not collapse.
The IT managers did not promise any functional extensions or software improvements.

Y2K  FUDS and Facts
Y2K is about systems failures, due to usage of two digits year presentation in a date field. The result is that both 1900 and 2000 are represented by 00 and therefore calculations will use 1900 instead of 2000 by mistake.

Y2K projects were actually Risk Management projects. However, they were justified by a lot of FUDS  and few real risk evidence.
The so called risks range span from sending to 105 years woman invitation to join kindergarten (Low severity risk) up to Nuclear Reactors explosions (High severity Risk).   

The only real evidences were Case Studies of few installations failed to address the leap year of 1996. 
Y2k is a higher severity risk than leap year miscalculation.
Addressing Y2K is a lot more complex issue than addressing leap year miscalculation.
The conclusion was that Y2K damage potential, can be quantified as hundreds or thousands multiples of measured 1996 leap year  miscalculation measured damages.
 
If I remember correctly, the most cited example of 1996 leap year problem was The Brussels Stock Exchange. This risk event occurrence could be quantified to actual sums of money lost due to treating February 29th as if it was March 1st. 

I am sure that Windows Azure Outage due to Leap Year miscalculation costs a lot more than the Brussels Stock Exchange same miscalculation in 1996.

Comments

Avi Rosenthal said…
LinkedIn Groups

Group: KnowYourCloud
Discussion: Bug 2000 (Y2K) is back in Microsoft's Azure
I wonder if there are MSFT customers that overcome the Azure downtime by a specific architecture of failover.
Posted by Ofir Nachmani
Avi Rosenthal said…
LinkedIn Groups

Group: iCMG Architecture World
Discussion: Bug 2000 (Y2K) is back in Microsoft's Azure
No surprise there. When you convert the 6-digit year to an eight-digit year, there is a cutoff point (in our case it was 2032) where the two-digit year assumes its century from. For example, is '09 = 1909 or 2009? For the conversions we did, '31 became 1931 and '32 became 2032.

I should imagine it only relates to converted dates at the time of conversion. Surely they didn't retain the 6-digit date format and continued to convert everything?
Posted by Doug Scott
Avi Rosenthal said…
LinkedIn Groups

Group: iCMG Architecture World
Discussion: Bug 2000 (Y2K) is back in Microsoft's Azure
Glad we did not use a "windowing" solution when we fixed our systems a couple of jobs ago!

Ah, but don't forget the Unix problem if you are running 32-bit libraries. Runs out of digits around 2038 (see http://en.wikipedia.org/wiki/Year_2038_problem ). I'll be retired by then but guess there'll be a few issues.

BTW, we did see one issue in my project that would have had a major impact if not detected at the time it happened. And there were a couple of problems on Dec-31-2000 due to the leap year (exception to an exception): 7-11 had a POS problem and a Scandinavian train system (I think it was Norway) also had issues. Finally, Apple's website showed 1/1/19100 on Jan-01-2000. :)
Posted by Jose Solera, MBA, PMP®, CSM, CSPO, CSP
Avi Rosenthal said…
LinkedIn Groups

Group: iCMG Architecture World
Discussion: Bug 2000 (Y2K) is back in Microsoft's Azure
I must be getting old. I had forgotten the term "Windowing solution"; I had always thought it risky, particularly for an insurance company that was selling policies for people aged over 100 years old (there are an increasing number of such people). Still for normal accounting purposes, it worked fine - it was messy, that's all, and I hate messy solutions.
Posted by Doug Scott

Popular posts from this blog

The mainframe: still alive and kicking

Recently, I was interviewed by  Pcon   (unfortunately the link points to an Hebrew only site) as part of debriefing on Legacy Systems.  Pcon is an Israeli company investigating IT topics by quoting professional articles and interviewing experts. They publish the results of the investigations including practical recommendations. This post is mainly about topics raised by me during the interview, but not included in the debriefing, which will be published.    What are Legacy Systems? The term Legacy Systems refers to old application systems and/or veteran technologies still in use.  Usually, the term Legacy Systems is associated with: 1. Mainframe Hardware e.g. IBM System z and its Operating Systems or Proprietary Servers and Operating Systems such as HP Alpha and OpenVMS Operating System, IBM AS/400 and OS/400   Operating System. 2. Development and Production Environments, e.g. COBOL , Natural and DBMS systems such as Adabas  ...

Will Business and IT Aligned?

For decades we are talking about closing the gap between business and IT , but the gap is still as wide as it was. In the beginning of the ERP era, we focused on aligning Business Processes and Core Systems, but in most enterprises we failed. SOA was the next alignment promise: defining the SOA Services in Business boundaries instead of Technical boundaries, should narrow the gap. However, despite of SOA Business Value ( Agility and Reuse )  in most enterprises,  the large Business-IT Gap remained as large as it was.  The IT Community aimed at the next alignment attempt: SOA is technical and BPM is its Business related complement.  Will the current BPM based alignment attempt succeed? I do not know, but Nick Heath's article  titled: Stop doing what the vendors tell you, CIOs told , published in  Tech Republic , suggests that the root of the problem is not Technological .   Stop Doing What the vendors Tell You Nick Heath's article is based ...

Vendors Survival: Will Software AG Survive until 2019?

This post is another post in the Vendors Survival series following posts on Microsoft , Google , HP , Sun and EMC . On July 14 th Software AG and IDS Scheer announced that Software AG is going to take over IDS Scheer . The intended acquisition is an opportunity to add another post in my Vendors Survival posts series. A brief history of Software AG Mainframe products Software AG is larger than any German software company except SAP . It was established in the Mainframe age (in 1969). I worked with many customers, who used and some of them are still using, its two flagship products Adabas and Natural . Although these products support many platforms, their main platform is IBM Mainframe. Adabas is a database and Natural is a development environment. Like other pairs of Database and Development Environment in the mainframe environment (e.g. Ideal and Datacom , Mantis and Supra) build by the same vendor, they are tied together. As a result, although it is possible t...