Human Error refers to the faults done by humans unknowingly but a single human error may cause the huge damage let me narrate you a very interesting story on "human errors".
Let me share with you a biggest “human error” in the entire IT industry. There was a long holiday in the UK from 27th May to 29th May called as Bank Holiday but during these 3 days, London’s Heathrow airport was over crowded with frustrated & angry faces. With this incident happened in the British Airways 100s of flights were delayed & canceled, around 75000 passengers were left stranded. British Airways complete computer systems were unavailable for nearly 3 days. British Airways parent company “International Airlines Group” appointed a third party investigation agency to investigate this complete IT failure. Finally, after 3 days the investigation agency revealed its report & confirmed that it’s not an IT failure itself, however, it was a “Human Error”.
An IT engineer went to the Data center for a maintenance work near the London’s Heathrow airport & unknowingly he disconnected the power supply. This meltdown is called as an unplanned outage in IT terminology, The outage estimated to have cost as much as 80 Million pounds (102 Million USD). British Airways had to cancel 479 flights or 59 percent of its time table. It has disrupted the flight operations, all the call centers & websites in British Airways. Even the Britain’s Prime Minister Theresa May had called on British Airways to compensate the passengers who were affected & left stranded during this incident. Fortunately, British Airways officials reported that they wouldn’t blame a poor IT employee rather would put the blame on the C-Suit executives & top management of the IT vendor.
Human Error in the software industry is not the first time; this has been a long history, a very basic syntax errors even in the computer programming have led to the huge damage. Many of the launched rockets & satellites were left uncontrolled due to the very minute syntax error in the software programs.
Currently, the industry is in the complete shaft towards Automation, Robotic & Digitization, Couple of question arises in the mind & recommends.
- A sustainable Disaster Recovery system to be in place: If the network or system in a live data center is down the application should fail over to another backup data center.
- Alerting & alarming in another data center to function well in case if the UPS is unplugged.
- Again this incident reminds the IT infrastructure readiness & high availability by migrating from on premise to cloud.
Hey thanks for this amazing post! Thank you so much for sharing the good post, I appreciate your hard work.Keep blogging.
ReplyDeleteDevOps Training in Electronic City