The best crew I have flown with


Today is the eve of the lunar Chinese New Year (27th January 2017), and it is time to recall the things that happened over the course of the last 365 days. Despite the tragedy, the faces of the crew members and the helping and concerned passengers warms my heart till today.

I sent the below email to Xiamen International Airline email contact today (27th January 2017).

= = = = = = =


Today at 10:30


I am a passenger on MF855 from Tianjin to Singapore on 9th December 2016 with stop-over at Xiamen International airport.

After the flight took off from Xiamen International airport and shortly after dinner was served on board which is about 7pm, a passenger near the emergency row had problem and needed medical attention which the crew quickly responded and took turns giving CPR. Announcements were made to all passengers to seek for medical expertise but no one is clearly available.

The crew gave CPR non-stop until the airplane touched down around 11pm and the medical doctor from Singapore Changi Airport is on board. While the passenger not gaining consiousness and passed away, the effort of the well trained and responsive crew members are all to be commended. Besides family members of the passenger-in-distress, two other passengers also rendered help and the crew members expressed their gratitude to them.

I am proud to have seem the great professionalism, dedication, and warmth of the crew of Xiamen International airline MF855 that night.

Please forward this letter to the crew with my gratitude.

I apologise for taking so long to write this note.

With best regards,


= = = = = = = =

Just to add, all the passengers on board co-operated to wait patiently and quietly, as the crew had announced to let the doctor from the airport to get on board quickly to assist the passenger-in-distress. We disembark the plane at around 11:45pm.

The best crew I have flown with

Hyperscale, 3rd party colocation service providers and the enterprise data center


Published 22 January 2017

Last November, I attended the DataCenterDynamics Zettastructure conference in London. There was a number of workshops on Open Compute Project (OCP) and one particular topic stands out – how OCP will impact the third party colocation players in Europe. To me, by extension, the same issue is faced by data centers in Asia when considering OCP type of racks.

On OCP website, it says “The Open Compute Project is …. More efficient, flexible and scalable”. The question is, to whom? At the moment, they are meant for the hyperscale data centers, i.e. used by Facebook, Yahoo!, Microsoft and such.

One benefit cited by OCP vendors is the speed to implement the compute/storage capacity, which meant that the compute/storage capacity arrives on site and ready to plug in. There should not be any on rack-on/rack-off work needed other than to plug the power in.

In the United States, Facebook, Yahoo!, Microsoft have large facilities (be it first party or third party custom-built site) are designed and built to accommodate hyperscale deployment and these sites accommodate the OCP racks without major issue.

The thing is, most sites in the rest of the world is not planned, designed nor implemented to accommodate thousands of OCP racks. The workshop where I participated in have colocation service providers asking the OCP data center project members what is the average power draw of average OCP racks, so that their private suite or colocation hall can accommodate some limited quantity of OCP racks.

When I talk to data center engineers from the Baidu-Alibaba-Tencent trio, they said their Project Scorpio (now called Open Data Center Committee – ODCC) racks are designed to fit into the top few data center facilities in data centers in 1st and 2nd tier China cities, on average putting 7kW cap per rack power capacity when going into third party colocation facility. This philosophy meant their asset light data center deployment with hot/cold aisle containment deployment of the Scorpio racks can go as planned in nearby every city that they wanted to deploy compute/storage capacity.

The other issue with OCP / ODCC racks are that these are mainly designed for hyperscale data center usage, and the largest users of IT hardware, meaning the enterprises are so called “missing out” on the benefits of quick deployment of IT capacity. Data centers in Asia, be it colocation space or enterprise data centers/computer rooms, are mostly around 5 to 6kW per rack in most of Asia (reference 4, 5 and 6).

Be it Baidu-Alibaba-Tencent, or Facebook, Yahoo!, Microsoft, these OCP / ODCC racks will not benefit the enterprises unless they accommodate demand of enterprise data center. Currently, the enterprise IT side do not see much benefit of OCP / ODCC, as they don’t look at their need of compute/storage on the scale that the current clients of OCP / ODCC. However, I believe this will change. Enterprise IT talk about software / app deployment too, and compute/storage last and this create pressure on the data center folks to quickly get ready space/rack and the IT capacity folks procure server/storage/network to add to current pool. Until the OCP / ODCC vendors think in terms of the way of the enterprise IT, which I predict they will, the enterprise data center market will not warm up the the OCP / ODCC vendors.

However, this is where I think the OCP vendors ill not limit their offerings to the Internet giants. They will need to consider when designing their hardware in consideration of the enterprise market because it is much larger than the Internet giants, such as designing their racks (which includes compute/storage/network gear) to be in stepped load of say 6, 8, 10 kW, and in terms of how enterprise IT will use them, i.e. on a per rack or per project basis or per enterprise private cloud basis. A new OCP vendor that I spoke to in London said that given the competition and the limited customer pool (of hyperscale data center), they want to sell to the enterprises. Sooner or later, we will see some sort of OCP / ODCC racks that are designed for deployment by enterprise into enterprise data centers and also third party colocation data centers.






Hyperscale, 3rd party colocation service providers and the enterprise data center

Data Center: The human factor

The focus of the data center industry in general has followed a pattern of technological innovations in mechanical cooling (e.g. containment), electrical efficiency, new energy, greater integration and management (e.g. rack as a computer – Open compute, DCIM etc) and so on.

One factor that don’t stand out but impact all of the above is the human factor, be it in planning, selection and procurement, design, testing and commissioning, or operations.

A not so well (i.e. bad) designed data center can hum along fine through the hard work and effort of the operations team. A well designed and implemented data center can have multiple unplanned downtime incidents due to one inexperienced or careless operations member.

Separate studies by Uptime Institute and Ponemon pegged the percentage of data center outages ranging from 22% to 70%. The figure of 22% is still a significant percentage.

I have personally experienced the consequences of several outages caused by human error. And post incident review had shown in all such cases that they can all be avoided. Even problem caused by design lapse, can be mitigated through 3rd party review and executing mitigation plan. And design limits may be exceeded or equipment limits deteriorated through prolonged maintenance window or environmental factor (high humidity Singapore is way tougher on equipment) and so forth.

A well run place, after several years of stringent adherence to operations is like a well-run military camp, everything is labelled and no wire is left dangling. And every operations staff knows their job when you asked them, such as which is the right operations manual and where is it located say to perform maintenance on a diesel generator. You know you will have a problem when he needs to ask someone else and if this happens at night when this chap is on duty, guess how long it takes to call his colleague or his superior to react to an issue?

Whenever a serious data center occurs is traced to human error, it may take years for the error to manifest and the person being blamed is the current manager in charge. The hidden problem may be made worse through years of ignoring the problem.

There was a job opening in a Singapore data center complex that has been vacant for more than 2 years and hardly anyone apply for it, because the site had been known to have power outages due to insufficient capacity problem. No one wants to get into the hot seat. Eventually that entire building will be upgraded while all the existing tenants have to move out to facilitate the upgrading works.

When I do a data center operations audit, one area I pay particular attention to is the data center organization chart and the authority that the data center manager has. In one instance, there isn’t a well-defined organization chart and a crucial post is vacant and is “covered” by the subordinate, as in one case, I highlight in the report that this is a critical gap and needs to be addressed right away.

While it cost lots of investment to build a data center, and technically and financially challenging to upgrade any component of a data center, it is worthwhile to upgrade and enhance the people running and managing the data center. Annual training plan should be drafted and regularly reviewed. Table top exercises planned, executed and reviewed. Regular data center operations meeting and sharing sessions should be held to share potential problems and solutions. One thing that marks a well-run data center is dealing with the issue of near-miss whereby such a data center operations team will review the reasons for the near-miss (irregardless if it relates to safety, or impact to operations) and create measures to mitigate and reduce the risks that lead to the near-miss. A data center power outage incident was caused by general cleaner wiping the anti-static mat in the low-voltage switch board room with a wet mop, and when the facility engineer was doing his inspect rounds wearing a worn-out shoes slipped and he grab on to a power switch and caused it to become OFF. The incident should have been avoided through many possible ways which I will leave it to the reader to work out.

One area to help where in Asia generally it is not happening is sharing of accident to avoid future occurrences. I agree with Ed Ansett when he commented (see reference link 4) that problem can be avoided if repeated problems elsewhere can be avoided. I think we cannot afford to given that cost of outages have become more expensive and more impactful.

Training and development of data center fundamentals and operations is one aspect where I see greater investment and I think it is an upward trend given that data center is becoming more mission critical than before for many enterprises, large and small.

This is an area that warrants greater thoughts and much work to enhance the availability, prevent outage and improve recovery time, and the people working in the data center operations team will love to be appreciated for their hard work and will work better at it.


Data Center: The human factor